Sparse Autoencoders

AI Safety Diary: September 4, 2025

A diary entry introducing AI interpretability and discussing a paper on the limitations of sparse autoencoders for finding canonical units of analysis in LLMs.