Ai Safety Diary

AI Safety Diary: September 22, 2025

A diary entry on the 8th chapter of the Effective Altruism Handbook, focusing on the practical application of EA principles in career choices, donations, and community involvement to maximize positive impact.

AI Safety Diary: September 21, 2025

A diary entry on Chapter 9 of the AI Safety Atlas, focusing on interpretability and the importance of understanding the internal workings of complex ‘black box’ AI models to ensure safety.

AI Safety Diary: September 20, 2025

A diary entry on Chapter 8 of the AI Safety Atlas, focusing on the challenge of scalable oversight and how to effectively supervise AI systems that may become more intelligent than humans.

AI Safety Diary: September 19, 2025

A diary entry on Chapter 7 of the AI Safety Atlas, focusing on the challenge of generalization and ensuring AI systems behave reliably when encountering novel, out-of-distribution scenarios.

AI Safety Diary: September 18, 2025

A diary entry on Chapter 6 of the AI Safety Atlas, focusing on the challenge of misspecification, where AI systems pursue flawed or incomplete goals, leading to unintended and potentially harmful outcomes.

AI Safety Diary: September 17, 2025

Watched the introductory lecture from the AI Safety Book series. The video provides a foundational overview of AI safety, discusses the ethical considerations, and outlines the landscape of potential risks associated with advanced AI.

AI Safety Diary: September 16, 2025

Completed Chapter 7 of the Effective Altruism Handbook, ‘What do you think?’, which emphasizes the importance of critical thinking and actively contributing personal insights to the community discourse.

AI Safety Diary: September 15, 2025

Draws parallels between AI ‘scheming’ and ape language experiments, exploring deceptive tendencies in LLMs and the need for advanced monitoring for AI safety.

AI Safety Diary: September 14, 2025

Evaluates the ability of frontier LLMs to persuade users on harmful topics, assessing their strategies and the implications for AI safety and ethics.

AI Safety Diary: September 13, 2025

Investigates how LLMs can be tuned to become more susceptible to jailbreaking, highlighting the implications for AI safety and the need for robust defenses.