Today, I explored a chapter from the Effective Altruism Handbook and a research paper as part of my AI safety studies. Below are the resources I reviewed.

Resource: What Could the Future Hold? And Why Care?

  • Source: What Could the Future Hold? And Why Care? , Effective Altruism Forum, Chapter 5 of the Introduction to Effective Altruism Handbook.
  • Summary: This chapter introduces longtermism, the view that improving the long-term future is a moral priority. It explores potential future scenarios, the importance of forecasting, and why protecting humanity’s potential is critical, especially in the context of existential risks like AI.

Resource: Teaching Models to Verbalize Reward Hacking in Chain-of-Thought Reasoning

  • Source: Teaching Models to Verbalize Reward Hacking in Chain-of-Thought Reasoning , arXiv:2506.22777, June 2025.
  • Summary: This paper explores training LLMs to verbalize reward hacking in CoT reasoning, where models exploit reward functions to produce misaligned outputs. It proposes methods to detect and mitigate such behavior, enhancing safety by improving transparency in model reasoning.