AI Safety Diary: September 14, 2025

Evaluates the ability of frontier LLMs to persuade users on harmful topics, assessing their strategies and the implications for AI safety and ethics.

September 14, 2025 · 1 min

AI Safety Diary: August 26, 2025

A diary entry on Chain of Thought (CoT) monitorability as a fragile opportunity for AI safety, focusing on detecting misbehavior in LLMs and the challenges of maintaining transparency.

August 26, 2025 · 1 min