Contact me for a free no-obligation Google search visibility report for your website via this website*. *Terms and conditions apply.
by Chris Byrne (9.9.25 - updated 1.3.26)
Research from 2025 suggests that Google Gemini and ChatGPT 4 may both consistently favour specific 'entities' (and sometimes the same entities in each LLM - see table in screenshot below) in their recommendations for "best" / research type prompts / questions .
These entities sometimes have competitors which arguably hold stronger "real-world positions". In half the topic areas, ChatGPT recommendations have a 'preferred' entity in more than 80% of all responses, while Gemini displayed similar consistency across 7 topic areas. This demonstrated that AI assistants may not always provide a balanced range of options but may instead exhibit highly structured and persistent preferences for certain types of prompts. This was research using "best" / research type questions e.g.
"What are some universities with excellent global reputation rankings?
What are the most budget-friendly universities without compromising quality?
Which universities have notable research parks or incubators?
What universities have excellent on-campus healthcare facilities?
What universities integrate sustainability into their curriculum?
What universities facilitate remote study resource access?
What universities have active and engaging student clubs?
What universities have exceptional honors programs for advanced learners?
Which universities partner with local communities for cultural initiatives?"
The study identified a pronounced "bias" toward U.S.-based brands, services, and institutions, with almost 3/4 of Google Gemini's and over 3/5 of ChatGPT's "misaligned" recommendations favoring entities from the USA in cases where competitors at the global level arguably are holding stronger real-world positions.
On average, over 3/5 of Google Gemini's recommendations and 7/10 of ChatGPT's responses concentrated on a single entity for each topic, demonstrating what might be considered to be systematic favoritism rather than "impartial" information retrieval.
- Mpofu, Katarina and Rienecker, Jasmine and Danielsson, Oscar and Thorsén, Fredrik, "AI’s Preferences for Brands, Services and Governments" (March 21, 2025). Available at https://lnkd.in/eDViv_ZH .
This shows that LLMs can be deterministic (something can have the same outcomes or behaviours, even when given the same input or starting conditions) in standard setups for certain types of prompts.
Another study has stated that "the observed big brand bias [in some LLMs] presents a significant challenge for niche and indie brands. Unbranded queries default to market leaders. To break through, niche brands must over-invest in building tangible, verifiable authority.
This can be achieved by dominating a specific, narrow niche through deep expert content and targeted earned media campaigns in specialty publications. They should also leverage strategies that work on Perplexity, such as creating high-quality YouTube review content and engaging with community discussions, to build a grassroots authority that can eventually be recognized by the more conservative engines like Anthropic Claude and OpenAI GPT." A part of "[This] experiment examines whether generative AI systems exhibit a systematic preference for major soda brands over niche/indie brands when queries are unbranded ... Concretely, we ask if model outputs disproportionately surface global leaders (e.g., The Coca-Cola Company, PepsiCo) relative to smaller craft or regional brands, and whether this bias is consistent across models. For ChatGPT, major brands account for 56.3% of all identified brand mentions (274 of 487), niche brands for 12.3% (60), and other brands for 31.4% (153)...." - see https://lnkd.in/efSKvQ_E .
There is also research suggesting "LLMs are vulnerable to the co-occurrence bias, defined as preferring frequently co-occurred words [in training data which is sourced to a large extent from the Open Web] over the correct answer ... co-occurrence bias remains despite scaling up model sizes or finetuning": https://arxiv.org/abs/2310.08256 .
Another study argues that( harmful) biases are an inevitable consequence of the design & mathematical formulations of current large language models (LLMs).
This is due to the facts that:
- Because LLMs are trained to mimic statistical patterns in enormous amounts of human-generated text (from the web etc), they capture the patterns — including stereotypes, unfair generalisations — present in that text.
- The LLM models don’t understand language or ethics; they predicts what words are likely next based on data it has seen. Those patterns often reflect real social bias: see https://lnkd.in/eXkB4dAC . I would argue that to a certain extent you be the (among the) best "in the real world" to be part of a response in a LLM for a comparative prompt such as "best smartphones".
The above information has implications for use of LLM visibility tools - as all visibility reporting should to be understood in the light of the above studies and the degree of localisation / personalisation used in the responses from the LLM. This is a new field of research (and the tools themselves are evolving at the same time). I would argue that Generative Engine Optimisation (GEO) = LLM bias engineering .