AI Safety Diary: August 16, 2025
A diary entry on Anthropic’s research into Persona Vectors, a method for monitoring and controlling character traits in Large Language Models (LLMs) to improve safety and alignment.
A diary entry on Anthropic’s research into Persona Vectors, a method for monitoring and controlling character traits in Large Language Models (LLMs) to improve safety and alignment.