AI Safety Diary: October 2, 2025

A diary entry on the 8th chapter of the AI Safety Book, providing a deep dive into the challenges and potential solutions in AI governance, from corporate self-regulation to international treaties.

October 2, 2025 · 1 min

AI Safety Diary: October 1, 2025

A diary entry on the 7th chapter of the AI Safety Book, which analyzes AI development through the lens of collective action problems, such as arms races and the tragedy of the commons.

October 1, 2025 · 1 min

AI Safety Diary: September 25, 2025

A diary entry on the 4th chapter of the AI Safety Atlas, focusing on the critical area of AI governance and the challenges of creating effective policies and institutions to manage AI development globally.

September 25, 2025 · 1 min

AI Safety Diary: September 14, 2025

Evaluates the ability of frontier LLMs to persuade users on harmful topics, assessing their strategies and the implications for AI safety and ethics.

September 14, 2025 · 1 min

AI Safety Diary: September 13, 2025

Investigates how LLMs can be tuned to become more susceptible to jailbreaking, highlighting the implications for AI safety and the need for robust defenses.

September 13, 2025 · 1 min

AI Safety Diary: September 8, 2025

A diary entry on common use cases for AI models and the risks of models obfuscating their reasoning to evade safety monitors.

September 8, 2025 · 1 min

AI Safety Diary: September 2, 2025

A diary entry on Anthropic’s strategies for combating AI-enabled cybercrime, including threat intelligence, robust safety protocols, and collaboration to prevent misuse of AI systems.

September 2, 2025 · 1 min

AI Safety Diary: August 29, 2025

A diary entry on AI governance strategies to avoid extinction risks, discussing catastrophic risks from misalignment, misuse, and geopolitical conflict, and the need for urgent research into governance mechanisms.

August 29, 2025 · 1 min

AI Safety Diary: August 24, 2025

A diary entry on the audio version of Chapter 4 of the AI Safety Atlas, focusing on governance strategies for safe AI development, including safety standards, international treaties, and regulatory policies.

August 24, 2025 · 1 min

AI Safety Diary: August 23, 2025

A diary entry on the audio version of Chapter 3 of the AI Safety Atlas, focusing on strategies for mitigating AI risks, including technical approaches like alignment and interpretability, and governance strategies.

August 23, 2025 · 1 min