Llm Safety

AI Safety Diary: September 14, 2025

Evaluates the ability of frontier LLMs to persuade users on harmful topics, assessing their strategies and the implications for AI safety and ethics.

AI Safety Diary: August 26, 2025

A diary entry on Chain of Thought (CoT) monitorability as a fragile opportunity for AI safety, focusing on detecting misbehavior in LLMs and the challenges of maintaining transparency.