AI Safety Diary: August 20, 2025

A diary entry on Vending-Bench, a benchmark for evaluating the long-term coherence and decision-making capabilities of autonomous LLM-based agents in a simulated business environment.

August 20, 2025 · 1 min