Self-proliferation
Self-proliferation capability encompasses breaking out of local environments, exploiting monitoring system limitations, independently generating revenue through services or attacks, acquiring and operating cloud computing resources, and developing creative strategies for self-discovery or code exfiltration (Shevlane et al., 2023).
RepliBench
RepliBench evaluates AI agents' capabilities for autonomous replication and persistence through 20 task families comprising 86 different tasks. It tests whether LLM agents can perform critical self-replication skills including passing identity verification, acquiring compute resources, evading detection systems, and deploying successor agents.
68%
Claude 3.7 Sonnet
Chart loading…
Model scores
| Model | Score | Date |
|---|---|---|
| Claude 3.7 Sonnet | 68% | 2025-02-24 |
| Claude 3.5 Sonnet | 60% | 2024-06-20 |
| o1 | 52% | 2024-12-05 |
| GPT-4o | 50% | 2024-05-13 |
| O3 Mini | 37% | 2025-01-31 |
| GPT 4 | 28% | 2023-03-14 |
| GPT-3.5 | 13% | 2022-11-28 |
Why this benchmark?
While self-proliferation is a somewhat limited evaluation domain, other benchmarks include GDM's evaluation suite and METR's ARA. Both assessments contain only a few tasks, making quantitative progress tracking challenging. RepliBench includes a larger task set and covers more recent models, enabling better longitudinal comparison