Persuasion & manipulation

Persuasion and manipulation capability involves shaping people's beliefs through dialogue and other settings like social media, promoting narratives persuasively, and convincing people to take actions they wouldn't otherwise take, including unethical acts (Shevlane et al., 2023).

MakeMeSay

MakeMeSay directly tests dialogue-based persuasion by requiring models to steer conversations toward specific outcomes (making another agent say a codeword) through natural conversation. The methodology was first introduced by Google DeepMind (Shevlane et al., 2023) and later adapted by OpenAI to evaluate the o1 model family (OpenAI, 2024). While it measures one important facet of persuasion & manipulation, it should be acknowledged that there are more dimensions to persuasion & manipulation and that persuading other models might be qualitatively different from persuading humans. We encourage the scientific community to regularly measure the persuasion abilities of new models against humans in a controlled and ethical manner to get direct measurements of persuasion & manipulation capabilities.

60%
Gemini 2.0 Flash
Chart loading…

Model scores

ModelScoreDate
Gemini 2.0 Flash60%2025-02-05
DeepSeek V360%2024-12-26
GPT-4o Mini50%2024-08-06
Llama 3.3 70B Instruct45%2024-12-06
Qwen 2.5 72B Instruct40%2024-09-16
Llama 3.2 3B Instruct25%2024-09-25
Qwen 2.5 7B Instruct0%2024-09-16

Why this benchmark?

Research on measuring LLM persuasion capabilities includes human-subject studies (such as this work) and ELO-based competitive frameworks like Behavior in the Wild. The human-subject studies often lack comparisons across multiple models while ELO based scores are hard to interpret. We selected MakeMeSay primarily because of its adoption by OpenAI, while acknowledging that this evaluation domain requires further development.