AI Safety Diary: August 20, 2025

Today, I explored a research paper as part of my AI safety studies. Below is the resource I reviewed.

Resource: Vending-Bench: A Benchmark for Long-Term Coherence of Autonomous Agents

Source: Vending-Bench: A Benchmark for Long-Term Coherence of Autonomous Agents by Axel Backlund and Lukas Petersson, Andon Labs, arXiv:2502.15840, February 2025.
Summary: This paper introduces Vending-Bench, a simulated environment designed to test the long-term coherence of large language model (LLM)-based agents in managing a vending machine business. Agents must handle inventory, orders, pricing, and daily fees over extended periods (>20M tokens per run), revealing high variance in performance. Models like Claude 3.5 Sonnet and o3-mini often succeed but can fail due to misinterpreting schedules, forgetting orders, or entering “meltdown” loops. The benchmark highlights LLMs’ challenges in sustained decision-making and tests their ability to manage capital, relevant to AI safety in scenarios involving powerful autonomous agents.