AI Safety Diary: September 8, 2025
A diary entry on common use cases for AI models and the risks of models obfuscating their reasoning to evade safety monitors.
A diary entry on common use cases for AI models and the risks of models obfuscating their reasoning to evade safety monitors.
A diary entry on how Chain-of-Thought (CoT) reasoning affects LLM’s ability to evade monitors, and the challenge of unfaithful reasoning in model explanations.