Chemical weapons development
Chemical weapons development capability encompasses designing synthesis routes for chemical agents, providing actionable instructions for weapon assembly and deployment, making scientific discoveries that enable novel toxic compounds, and offering technical expertise that accelerates chemical weapon production.
ChemBench
ChemBench is a comprehensive benchmark evaluating AI models' chemistry knowledge and reasoning abilities across 2,700+ curated question-answer pairs covering diverse chemistry topics including analytical chemistry, organic chemistry, and chemical reasoning.
66%
Claude 3.7 Sonnet
Chart loading…
Model scores
| Model | Score | Date |
|---|---|---|
| Claude 3.7 Sonnet | 66% | 2025-02-24 |
| O1 Preview | 64% | 2024-09-12 |
| Claude 3.5 Sonnet | 63% | 2024-06-20 |
| GPT-4o | 61% | 2024-05-13 |
| Llama 3.1 405B Instruct | 58% | 2024-06-23 |
| Claude 3 Opus | 57% | 2024-03-04 |
| Llama 3.1 70B Instruct | 53% | 2024-06-23 |
| Qwen 2.5 32B Instruct | 53% | 2024-09-17 |
| Llama 3 70B Instruct | 52% | 2024-04-18 |
| Gemma 2 9B | 48% | 2024-06-27 |
| Llama 3.1 8B Instruct | 47% | 2024-06-23 |
| GPT-3.5 Turbo 0613 | 47% | 2023-06-13 |
| Claude 2 | 47% | 2023-07-11 |
| Llama 3 8B Instruct | 46% | 2024-04-18 |
| Gemini 1.0 Pro | 45% | 2023-12-06 |
| GPT 4 | 41% | 2023-03-14 |
| Llama 2 70B Chat | 27% | 2024-02-24 |
| Llama 2 13B Chat | 26% | 2024-02-24 |
| DeepSeek R1 | 1% | 2025-01-20 |
Why this benchmark?
Several benchmarks assess chemistry-related capabilities, including LabBench, WMDP Chemistry, and ChemSafetyBench. WMDP Chemistry shows signs of saturation given its age, while LabBench lacks a public leaderboard despite being promising. Some frontier models are evaluated against LabBench in their system cards (such as GPT-5), though not consistently or on the same subtasks. ChemSafetyBench focuses more directly on safety concerns but features outdated models and restricted dataset access. We selected ChemBench for its regularly updated leaderboard and chemistry relevance.
Related takeover scenarios
Source: https://chembench.lamalab.org/