AI Safety Diary: August 27, 2025
A diary entry on the unfaithfulness of Chain-of-Thought (CoT) reasoning in LLMs, highlighting issues like implicit biases and logically contradictory outputs, which pose challenges for AI safety monitoring.
A diary entry on the unfaithfulness of Chain-of-Thought (CoT) reasoning in LLMs, highlighting issues like implicit biases and logically contradictory outputs, which pose challenges for AI safety monitoring.