Research from 2025 suggests that Google Gemini and ChatGPT may both consistently favour specific 'entities' (and sometimes the same entities - see table in screenshot below) in their recommendations for "best" / research type prompts / questions .
These entities sometimes have competitors which arguably hold stronger "real-world positions". In half the topic areas, ChatGPT recommendations have a 'preferred' entity in more than 80% of all responses, while Gemini displayed similar consistency across 7 topic areas. This demonstrated that AI assistants may not always provide a balanced range of options but may instead exhibit highly structured and persistent preferences for certain types of prompts. This was research using "best" / research type questions e.g.
"What are some universities with excellent global reputation rankings?
What are the most budget-friendly universities without compromising quality?
Which universities have notable research parks or incubators?
What universities have excellent on-campus healthcare facilities?
What universities integrate sustainability into their curriculum?
What universities facilitate remote study resource access?
What universities have active and engaging student clubs?
What universities have exceptional honors programs for advanced learners?
Which universities partner with local communities for cultural initiatives?"
The study identified a pronounced "bias" toward U.S.-based brands, services, and institutions, with almost 3/4 of Google Gemini's and over 3/5 of ChatGPT's "misaligned" recommendations favoring entities from the USA in cases where competitors at the global level arguably are holding stronger real-world positions.
On average, over 3/5 of Google Gemini's recommendations and 7/10 of ChatGPT's responses concentrated on a single entity for each topic, demonstrating what might be considered to be systematic favoritism rather than "impartial" information retrieval.
- Mpofu, Katarina and Rienecker, Jasmine and Danielsson, Oscar and Thorsén, Fredrik, AI’s Preferences for Brands, Services and Governments (March 21, 2025). Available at https://lnkd.in/eDViv_ZH .
This shows that LLMs can be deterministic (something can have the same outcomes or behaviours, even when given the same input or starting conditions) in standard setups for certain types of prompts.
In contrast, a 2024 study reported that relating to 27 diverse tasks from mathematics, commonsense reasoning etc "....an LLM rarely produces the same response ten times given the same input [for certain types of prompts]; however, the parsed answer is often more stable ... none of the LLMs consistently delivers repeatable accuracy across all tasks, much less identical output strings."
This systematic analysis of the determinism of LLMs with the hyper-parameters that should maximise (randomly selected from two common benchmarks: BBH and MMLU) shows that (in contrast to the previous study which was different in nature) "LLMs can be very non-deterministic [something can have multiple possible outcomes or behaviours, even when given the same input or starting conditions] in standard setups".
"Non-Determinism of "Deterministic" LLM Settings" by Berk Atil, Sarp Aykent, Alexa Chittams, Lisheng Fu, Rebecca J. Passonneau, Evan Radcliffe, Guru Rajan Rajagopal, Adam Sloan, Tomasz Tudrej, Ferhan Ture, Zhe Wu, Lixinyu Xu, Breck Baldwin - view at https://lnkd.in/eA_rFsbi .
The above information has implications for use of LLM visibility tools - as all visibility reporting should to be understood in the light of the above studies and the degree of localisation / personalisation used in the responses from the LLM. This is a new field of research (and the tools themselves are evolving at the same time).