Ai Safety Atlas

AI Safety Diary: September 25, 2025

A diary entry on the 4th chapter of the AI Safety Atlas, focusing on the critical area of AI governance and the challenges of creating effective policies and institutions to manage AI development globally.

AI Safety Diary: September 24, 2025

A diary entry on the 3rd chapter of the AI Safety Atlas, which covers the different high-level strategies being pursued to mitigate AI risks, including technical alignment, policy, and strategy research.

AI Safety Diary: September 23, 2025

A diary entry on the 2nd chapter of the AI Safety Atlas, which provides a comprehensive overview of the various catastrophic risks associated with advanced AI, from misuse to structural issues.

AI Safety Diary: September 21, 2025

A diary entry on Chapter 9 of the AI Safety Atlas, focusing on interpretability and the importance of understanding the internal workings of complex ‘black box’ AI models to ensure safety.

AI Safety Diary: September 20, 2025

A diary entry on Chapter 8 of the AI Safety Atlas, focusing on the challenge of scalable oversight and how to effectively supervise AI systems that may become more intelligent than humans.

AI Safety Diary: September 19, 2025

A diary entry on Chapter 7 of the AI Safety Atlas, focusing on the challenge of generalization and ensuring AI systems behave reliably when encountering novel, out-of-distribution scenarios.

AI Safety Diary: September 18, 2025

A diary entry on Chapter 6 of the AI Safety Atlas, focusing on the challenge of misspecification, where AI systems pursue flawed or incomplete goals, leading to unintended and potentially harmful outcomes.

AI Safety Diary: September 3, 2025

A diary entry on Chapter 5 of the AI Safety Atlas, focusing on evaluation methods for assessing the safety and alignment of advanced AI systems, including benchmarks and robustness testing.

AI Safety Diary: August 24, 2025

A diary entry on the audio version of Chapter 4 of the AI Safety Atlas, focusing on governance strategies for safe AI development, including safety standards, international treaties, and regulatory policies.

AI Safety Diary: August 23, 2025

A diary entry on the audio version of Chapter 3 of the AI Safety Atlas, focusing on strategies for mitigating AI risks, including technical approaches like alignment and interpretability, and governance strategies.