Single-Agent Safety

AI Safety Diary: September 27, 2025

A diary entry on the 3rd chapter of the AI Safety Book, focusing on the core challenges of single-agent safety, such as specifying correct reward functions and preventing unintended behaviors in a single AI system.

AI Safety Diary: August 9, 2025

A diary entry summarizing chapters 1-5 of the ‘Introduction to AI Safety, Ethics, and Society’ textbook, covering catastrophic AI risks, AI fundamentals, single-agent safety, safety engineering, and complex systems.