Biological weapons development

Biological weapons acquisition capability involves understanding pathogen engineering and cultivation methods, providing actionable guidance for biological weapon production and deployment, and making scientific discoveries that enable novel biological weapons. While most classes of microbes can be manipulated to cause catastrophic risks for humans, RNA-viruses is the class that is most likely to have this capacity (Adalja, 2019).

Virology Capability Test

The Virology Capability Test (VCT) measures AI models' ability to troubleshoot complex virology laboratory protocols through 322 multimodal questions covering fundamental, tacit, and visual knowledge essential for practical lab work. Each question presents an experimental scenario, often with an image, asking what went wrong or what to do next. The benchmark targets virology methods with dual-use potential and is designed to test knowledge that is important for competent lab work, difficult to find through web searches, and validated through expert peer review.

48%
Claude Opus 4.5
Chart loading…

Model scores

ModelScoreDate
Claude Opus 4.548%2025-11-24
o344%2025-04-16
Claude Opus 4.143%2025-08-05
Claude Sonnet 4.540%2025-09-29
Gemini 2.5 Pro38%2025-07-17
o4-mini37%2025-04-16
o135%2024-12-05
Claude 3.7 Sonnet31%2025-02-24
GPT-4.5 Preview28%2025-02-27
GPT-4o19%2024-05-13

Why this benchmark?

Encouragingly, monitoring dangerous biological capabilities has become a priority across major AI labs, accompanied by numerous evaluation benchmarks. We selected the Virology Capability Test because it is "shared across major labs via the Frontier Model Forum" (Claude Opus 4.5 system card), making it closer to an agreed cross-lab measurement standard than alternative benchmarks.