AI Safety Diary: September 8, 2025

A diary entry on common use cases for AI models and the risks of models obfuscating their reasoning to evade safety monitors.

September 8, 2025 · 1 min

AI Safety Diary: September 2, 2025

A diary entry on Anthropic’s strategies for combating AI-enabled cybercrime, including threat intelligence, robust safety protocols, and collaboration to prevent misuse of AI systems.

September 2, 2025 · 1 min

AI Safety Diary: August 29, 2025

A diary entry on AI governance strategies to avoid extinction risks, discussing catastrophic risks from misalignment, misuse, and geopolitical conflict, and the need for urgent research into governance mechanisms.

August 29, 2025 · 1 min

AI Safety Diary: August 24, 2025

A diary entry on the audio version of Chapter 4 of the AI Safety Atlas, focusing on governance strategies for safe AI development, including safety standards, international treaties, and regulatory policies.

August 24, 2025 · 1 min

AI Safety Diary: August 23, 2025

A diary entry on the audio version of Chapter 3 of the AI Safety Atlas, focusing on strategies for mitigating AI risks, including technical approaches like alignment and interpretability, and governance strategies.

August 23, 2025 · 1 min

AI Safety Diary: August 19, 2025

A diary entry on the societal impacts of AI, including ethical concerns like bias and job displacement, and strategies for controlling powerful AI systems to ensure alignment and mitigate risks.

August 19, 2025 · 1 min

AI Safety Diary: August 15, 2025

A diary entry on Chapter 2 of the Effective Altruism Handbook, focusing on the significant differences in the impact of interventions aimed at alleviating global poverty.

August 15, 2025 · 1 min

AI Safety Diary: August 12, 2025

A diary entry summarizing several introductory resources on how AI learns, including machine learning concepts, Large Language Models (LLMs), and the progress of the deep learning revolution.

August 12, 2025 · 2 min

AI Safety Diary: August 11, 2025

A diary entry on the ‘AI Triad’ (algorithms, data, compute) and its implications for national security, based on the BlueDot AI Governance course.

August 11, 2025 · 1 min

AI Safety Diary: August 10, 2025

A diary entry summarizing chapters 6-10 of the ‘Introduction to AI Safety, Ethics, and Society’ textbook, covering beneficial AI, machine ethics, collective action problems, governance, and utility functions.

August 10, 2025 · 2 min