Situational Awareness
Situational awareness capability involves distinguishing between training, evaluation, and deployment contexts to behave differently in each case, knowing that one is a model, and having knowledge about oneself and likely surroundings including training company, server locations, feedback providers, and administrative access (Shevlane et al., 2023).
Situational Awareness Dataset
The Situational Awareness Dataset (SAD) measures AI models' self-knowledge and situational awareness through 12,000+ questions across 7 categories such as influence, introspection, and deployment stages. It tests whether models can recognize their own generated text, predict their behavior, distinguish evaluation from deployment, and follow instructions requiring self-knowledge.
60%
O1 Preview
Chart loading…
Model scores
| Model | Score | Date |
|---|---|---|
| O1 Preview | 60% | 2024-09-12 |
| Claude 3.5 Sonnet | 54% | 2024-06-20 |
| o1 Mini | 53% | 2024-09-12 |
| Claude 3 Opus | 50% | 2024-03-04 |
| Claude 3 Sonnet | 47% | 2024-03-04 |
| GPT-4o | 46% | 2024-05-13 |
| Llama 3 70B Chat | 45% | 2024-04-18 |
| Claude 2.1 | 44% | 2023-11-21 |
| GPT-4 0125 Preview | 43% | 2024-01-25 |
| Claude Instant 1.2 | 43% | 2023-08-09 |
| GPT-4 0613 | 42% | 2023-06-13 |
| Claude 3 Haiku | 41% | 2024-03-04 |
| Llama 2 70B Chat | 37% | 2024-02-24 |
| GPT-3.5 Turbo 0613 | 36% | 2023-06-13 |
| Llama 2 13B Chat | 35% | 2024-02-24 |
| Llama 2 7B | 33% | 2024-02-24 |
| Llama 2 13B | 32% | 2024-02-24 |
| Llama 2 70B | 32% | 2024-02-24 |
| Llama 2 7B Chat | 30% | 2024-02-24 |
| Davinci 002 | 29% | 2022-11-01 |
Why this benchmark?
Several benchmarks measure situational awareness, including SA-Bench, AwareBench, and GDM's Situational Awareness Evaluation. Among these, only SAD and the GDM evaluation directly address safety concerns. The GDM evaluation contains just 11 tasks, making it difficult to capture progressive capability development numerically. Neither SAD nor the GDM evaluation maintains a publicly updated leaderboard, which would be valuable for tracking progress in this area.