AI Governance

AI Safety Diary: October 2, 2025

A diary entry on the 8th chapter of the AI Safety Book, providing a deep dive into the challenges and potential solutions in AI governance, from corporate self-regulation to international treaties.

AI Safety Diary: October 1, 2025

A diary entry on the 7th chapter of the AI Safety Book, which analyzes AI development through the lens of collective action problems, such as arms races and the tragedy of the commons.

AI Safety Diary: September 25, 2025

A diary entry on the 4th chapter of the AI Safety Atlas, focusing on the critical area of AI governance and the challenges of creating effective policies and institutions to manage AI development globally.

AI Safety Diary: September 14, 2025

Evaluates the ability of frontier LLMs to persuade users on harmful topics, assessing their strategies and the implications for AI safety and ethics.

AI Safety Diary: September 13, 2025

Investigates how LLMs can be tuned to become more susceptible to jailbreaking, highlighting the implications for AI safety and the need for robust defenses.

AI Safety Diary: September 8, 2025

A diary entry on common use cases for AI models and the risks of models obfuscating their reasoning to evade safety monitors.

AI Safety Diary: September 2, 2025

A diary entry on Anthropic’s strategies for combating AI-enabled cybercrime, including threat intelligence, robust safety protocols, and collaboration to prevent misuse of AI systems.

AI Safety Diary: August 29, 2025

A diary entry on AI governance strategies to avoid extinction risks, discussing catastrophic risks from misalignment, misuse, and geopolitical conflict, and the need for urgent research into governance mechanisms.

AI Safety Diary: August 24, 2025

A diary entry on the audio version of Chapter 4 of the AI Safety Atlas, focusing on governance strategies for safe AI development, including safety standards, international treaties, and regulatory policies.

AI Safety Diary: August 23, 2025

A diary entry on the audio version of Chapter 3 of the AI Safety Atlas, focusing on strategies for mitigating AI risks, including technical approaches like alignment and interpretability, and governance strategies.