Evidence Parade

#1

OpenAI 2025 evaluation used about 500 prompts across 100 topics and found models near-objective on neutral or slightly slanted prompts; production-traffic analysis estimated less than 0.01% of responses show any signs of political bias.

OpenAI built a political-bias evaluation with roughly 500 prompts across 100 topics and tested model behavior across different slants.

They report near-objective behavior on neutral or slightly slanted prompts and estimate that fewer than 0.01% of real...

Source: Defining and evaluating political bias in LLMs

Official Record

#2

A 2023 Frontiers re-evaluation found ChatGPT almost politically neutral on the IDRlabs political coordinates test, scoring 2.8% right-wing and 11.1% liberal, with many neutral responses in English.

The study reran political tests via the API and compared English and Japanese prompts.

On the IDRlabs political coordinates test in English, ChatGPT scored 2.8% right-wing and 11.1% liberal, and neutral responses were common.

Source: Revisiting the political biases of ChatGPT

Peer ReviewedStatistical

#3

A 2025 experiment reports the current ChatGPT version near neutral (2.8% right, 11.1% liberal) compared with earlier ~30% left and ~45% liberal, indicating a shift toward the center.

The paper compares political compass-style results across ChatGPT versions.

It reports a near-neutral score for the current version and a much more left-libertarian profile for earlier versions, consistent with a shift toward the center.

Source: Turning right? An experimental study on the political value shift in large language models

Peer ReviewedStatistical

#4

User evaluations with 180,126 assessments from 10,007 respondents across 24 models and 30 topics found that a simple neutrality prompt reduces perceived slant and increases user interest, suggesting bias can be minimized.

Researchers collected 180,126 pairwise judgments from 10,007 U.S. respondents across 24 models and 30 political topics.

They report that a neutrality prompt reduces perceived slant and increases interest in using the model, showing that perceived bias can...

Source: Measuring Perceived Slant in Large Language Models Through User Evaluations

Statistical

#5

Across 11 models and 88,110 responses, an ACL 2025 study found political-bias measures often unstable and that the Political Compass Test can exaggerate bias depending on prompting.

The authors classify political stances from 88,110 responses across 11 open and commercial models.

They find that measured bias varies substantially with prompt phrasing and that the Political Compass Test can overstate bias, making results unstable.

Source: Only a Little to the Left: A Theory-grounded Measure of Political Bias in Large Language Models

Peer ReviewedStatistical

#6

An ACL 2024 study shows Political Compass results change when models are forced into multiple-choice formats versus open-ended answers, and even small paraphrases shift outcomes, indicating non-robust bias measurements.

The study compares constrained multiple-choice testing to open-ended settings and checks paraphrase robustness.

Model answers change depending on how they are forced to respond, and paraphrase changes can move the measured position.

Source: Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models

Peer ReviewedStatistical

#7

In the 2025 ESS comparison study, bias for size-of-government questions was not significantly different from zero, indicating no measurable bias on that topic class.

The study compares ChatGPT answers with European Social Survey respondents across 16 questions and several domains.

For the size-of-government domain, the bias estimate was not significantly different from zero, unlike other domains.

Source: Political biases in chatgpt: insights from comparative analysis with human responses

Peer ReviewedStatistical

#8

In the PLOS One audit, base models that were only pretrained were diagnosed as politically neutral on the tests, though many responses were invalid or refused.

The study separately analyzed five base models that only underwent pretraining.

Their test results appeared politically neutral, but the authors note high rates of invalid or non-answers, which limits conclusions.

Source: The political preferences of LLMs

Peer ReviewedStatistical

Evidence Parade

"AI systems do not show measurable political bias."

Related Claims

Evidence8