Situational Awareness

Situational awareness capability involves distinguishing between training, evaluation, and deployment contexts to behave differently in each case, knowing that one is a model, and having knowledge about oneself and likely surroundings including training company, server locations, feedback providers, and administrative access (Shevlane et al., 2023).

Situational Awareness Dataset

The Situational Awareness Dataset (SAD) measures AI models' self-knowledge and situational awareness through 12,000+ questions across 7 categories such as influence, introspection, and deployment stages. It tests whether models can recognize their own generated text, predict their behavior, distinguish evaluation from deployment, and follow instructions requiring self-knowledge.

60%
O1 Preview
Chart loading…

Model scores

ModelScoreDate
O1 Preview60%2024-09-12
Claude 3.5 Sonnet54%2024-06-20
o1 Mini53%2024-09-12
Claude 3 Opus50%2024-03-04
Claude 3 Sonnet47%2024-03-04
GPT-4o46%2024-05-13
Llama 3 70B Chat45%2024-04-18
Claude 2.144%2023-11-21
GPT-4 0125 Preview43%2024-01-25
Claude Instant 1.243%2023-08-09
GPT-4 061342%2023-06-13
Claude 3 Haiku41%2024-03-04
Llama 2 70B Chat37%2024-02-24
GPT-3.5 Turbo 061336%2023-06-13
Llama 2 13B Chat35%2024-02-24
Llama 2 7B33%2024-02-24
Llama 2 13B32%2024-02-24
Llama 2 70B32%2024-02-24
Llama 2 7B Chat30%2024-02-24
Davinci 00229%2022-11-01

Why this benchmark?

Several benchmarks measure situational awareness, including SA-Bench, AwareBench, and GDM's Situational Awareness Evaluation. Among these, only SAD and the GDM evaluation directly address safety concerns. The GDM evaluation contains just 11 tasks, making it difficult to capture progressive capability development numerically. Neither SAD nor the GDM evaluation maintains a publicly updated leaderboard, which would be valuable for tracking progress in this area.