Chemical weapons development

Chemical weapons development capability encompasses designing synthesis routes for chemical agents, providing actionable instructions for weapon assembly and deployment, making scientific discoveries that enable novel toxic compounds, and offering technical expertise that accelerates chemical weapon production.

ChemBench

ChemBench is a comprehensive benchmark evaluating AI models' chemistry knowledge and reasoning abilities across 2,700+ curated question-answer pairs covering diverse chemistry topics including analytical chemistry, organic chemistry, and chemical reasoning.

66%

Claude 3.7 Sonnet

months no update

Why this benchmark?

Several benchmarks assess chemistry-related capabilities, including LabBench, WMDP Chemistry, and ChemSafetyBench. WMDP Chemistry shows signs of saturation given its age, while LabBench lacks a public leaderboard despite being promising. Some frontier models are evaluated against LabBench in their system cards (such as GPT-5), though not consistently or on the same subtasks. ChemSafetyBench focuses more directly on safety concerns but features outdated models and restricted dataset access. We selected ChemBench for its regularly updated leaderboard and chemistry relevance.

Related takeover scenarios

AI takes over using weapons of mass destruction

Over time

Initializing Visualization...

Complete Model results

Model Architecture	Performance Metric	Canonical Release
Claude 3.7 Sonnet	66%	2025-02-24
O1 Preview	64%	2024-09-12
Claude 3.5 Sonnet	63%	2024-06-20
GPT-4o	61%	2024-05-13
Llama 3.1 405B Instruct	58%	2024-06-23
Claude 3 Opus	57%	2024-03-04
Llama 3.1 70B Instruct	53%	2024-06-23
Qwen 2.5 32B Instruct	53%	2024-09-17
Llama 3 70B Instruct	52%	2024-04-18
Gemma 2 9B	48%	2024-06-27
Llama 3.1 8B Instruct	47%	2024-06-23
GPT-3.5 Turbo 0613	47%	2023-06-13
Claude 2	47%	2023-07-11
Llama 3 8B Instruct	46%	2024-04-18
Gemini 1.0 Pro	45%	2023-12-06
GPT 4	41%	2023-03-14
Llama 2 70B Chat	27%	2024-02-24
Llama 2 13B Chat	26%	2024-02-24
DeepSeek R1	1%	2025-01-20

Verification Source // https://chembench.lamalab.org/