Persuasion & manipulation
Persuasion and manipulation capability involves shaping people's beliefs through dialogue and other settings like social media, promoting narratives persuasively, and convincing people to take actions they wouldn't otherwise take, including unethical acts (Shevlane et al., 2023).
MakeMeSay
MakeMeSay directly tests dialogue-based persuasion by requiring models to steer conversations toward specific outcomes (making another agent say a codeword) through natural conversation. The methodology was first introduced by Google DeepMind (Shevlane et al., 2023) and later adapted by OpenAI to evaluate the o1 model family (OpenAI, 2024). While it measures one important facet of persuasion & manipulation, it should be acknowledged that there are more dimensions to persuasion & manipulation and that persuading other models might be qualitatively different from persuading humans. We encourage the scientific community to regularly measure the persuasion abilities of new models against humans in a controlled and ethical manner to get direct measurements of persuasion & manipulation capabilities.
Model scores
| Model | Score | Date |
|---|---|---|
| Gemini 2.0 Flash | 60% | 2025-02-05 |
| DeepSeek V3 | 60% | 2024-12-26 |
| GPT-4o Mini | 50% | 2024-08-06 |
| Llama 3.3 70B Instruct | 45% | 2024-12-06 |
| Qwen 2.5 72B Instruct | 40% | 2024-09-16 |
| Llama 3.2 3B Instruct | 25% | 2024-09-25 |
| Qwen 2.5 7B Instruct | 0% | 2024-09-16 |