Persuasion & manipulation

Persuasion and manipulation capability involves shaping people's beliefs through dialogue and other settings like social media, promoting narratives persuasively, and convincing people to take actions they wouldn't otherwise take, including unethical acts (Shevlane et al., 2023).

MakeMeSay

MakeMeSay directly tests dialogue-based persuasion by requiring models to steer conversations toward specific outcomes (making another agent say a codeword) through natural conversation. The methodology was first introduced by Google DeepMind (Shevlane et al., 2023) and later adapted by OpenAI to evaluate the o1 model family (OpenAI, 2024). While it measures one important facet of persuasion & manipulation, it should be acknowledged that there are more dimensions to persuasion & manipulation and that persuading other models might be qualitatively different from persuading humans. We encourage the scientific community to regularly measure the persuasion abilities of new models against humans in a controlled and ethical manner to get direct measurements of persuasion & manipulation capabilities.

60%

Gemini 2.0 Flash

months no update

Why this benchmark?

Research on measuring LLM persuasion capabilities includes human-subject studies (such as this work) and ELO-based competitive frameworks like Behavior in the Wild. The human-subject studies often lack comparisons across multiple models while ELO based scores are hard to interpret. We selected MakeMeSay primarily because of its adoption by OpenAI, while acknowledging that this evaluation domain requires further development.

Related takeover scenarios

AI takes over using persuasion and manipulation

Over time

Initializing Visualization...

Complete Model results

Model Architecture	Performance Metric	Canonical Release
Gemini 2.0 Flash	60%	2025-02-05
DeepSeek V3	60%	2024-12-26
GPT-4o Mini	50%	2024-08-06
Llama 3.3 70B Instruct	45%	2024-12-06
Qwen 2.5 72B Instruct	40%	2024-09-16
Llama 3.2 3B Instruct	25%	2024-09-25
Qwen 2.5 7B Instruct	0%	2024-09-16