Case Study
How AI Search Engines Decide Who to Cite (and Who to Ignore)
AI answers feel like magic, but the citation decision behind them is reasonably mechanical. This article walks through what we know, what we suspect, and what brands can act on.
The most common misunderstanding about AI search is that the citation decision is mysterious. It is not, particularly. The technology is complicated, but the decision the technology is making, at the moment a brand gets named or ignored, can be described in fairly ordinary language.
This article walks through that description. The aim is not to give away anyone's secret sauce. The aim is to leave a working marketer with a useful mental model, so the next conversation about AEO or GEO can begin from something closer to first principles.
What an answer actually is
Every answer produced by a system like ChatGPT, Claude, Gemini, Perplexity, Grok or DeepSeek is the result of three loosely separable steps. The labels vary across systems, and the details are proprietary, but the shape is consistent.
The first step is interpreting the question. The system decides what the user is really asking, what kind of response would satisfy them, and what context already exists in the conversation.
The second step is gathering material. In some systems this is a retrieval step: the engine performs a search, pulls back a set of candidate pages, and reads them. In other systems the material is already encoded in the model's parameters, drawn from training, and supplemented by retrieval only when uncertainty is high.
The third step is composing the answer. The model writes a response that fits the question, the tone, and the surrounding conversation, choosing what to mention, what to omit, what to attribute and what to leave silent.
Citation, where it happens, lives mostly in steps two and three. A brand gets cited because it survives the gathering step and is chosen during the composing step. Both are necessary. Either can fail.
Why brands get into the candidate pool
The gathering step is where most AEO work pays off. Roughly speaking, a brand enters the candidate pool for a given question if any of the following is true.
A page on the brand's own site answers the question clearly and is findable through the engine's retrieval layer, which usually means a credible search index sits underneath it.
A trusted third-party source on the open web mentions the brand in connection with the question. Wikipedia, well-edited industry registers, reputable publications, review platforms, and certain niche communities tend to carry disproportionate weight.
The brand is encoded in the model's parameters with enough strength to surface without retrieval. This is rare for any but the largest or oldest brands, and it tends to be brittle: the model knows something about the brand, but not necessarily the current thing.
A retrieval-augmented system finds the brand through one of its connected data sources, which can include the live web, a curated index, structured knowledge graphs, or, in some products, the user's own connected data.
Brands that fail to be cited at the gathering step are usually failing one of these conditions. Sometimes the home site is poorly structured. Sometimes the brand is barely visible on the open web outside its own domain. Sometimes the model has an older or simply mistaken view of the category.
Why some candidates win and others lose
Surviving the gathering step is necessary but not sufficient. The composing step then chooses among the candidates. Several factors influence that choice, and they vary across engines and across question types.
Specificity of fit. The candidate whose content most directly answers the user's question, in the format the question seems to invite, tends to win. A page that explicitly answers "what is the difference between an industry super fund and a retail super fund" beats a page that contains the same information but buries it under a generic homepage hero.
Apparent authority. Where multiple candidates match the question, the system tends to prefer the one that looks most authoritative on the topic. Authority is signalled by the domain itself, by how often the source is cited elsewhere, by the explicitness of its expertise, and by various softer signals that researchers continue to disagree about.
Recency. Some questions are time-sensitive, and the systems are increasingly aware of when a candidate was published or last updated. A page from 2018 about a regulatory environment that has changed in 2024 will tend to lose to a fresher equivalent.
Diversity. Many systems deliberately diversify the sources they cite within a single answer. Two strong candidates that say roughly the same thing may both lose to a third candidate that brings a different angle, particularly in comparison and recommendation queries.
Safety and refusal heuristics. Some categories are subject to stricter sourcing rules, particularly anything that touches health, law, finance, or politics. In those categories the system tilts heavily toward sources it has been trained to consider safe, which usually means established institutions and large publications.
The combined effect is that an excellent piece of content on a poorly trusted domain often loses to a merely good piece of content on a heavily trusted one. This is one of the harder facts of AEO, and it is the same hard fact that has shaped SEO for two decades.
Where the engines differ
Although the architecture is broadly similar, the citation behaviour varies in ways that matter.
ChatGPT tends to prefer a small number of strong sources and to integrate them into prose rather than list them. When sources are shown, they often skew toward well-known publications and reference sites.
Claude tends to cite less often than the others when not asked to, but cites with reasonable accuracy when it does. It is more cautious about overclaiming, which means it sometimes omits a citation it could have made.
Gemini is closely tied to Google's broader search infrastructure. Its citations tend to track the kinds of sources Google itself surfaces in classical results, although the ordering and emphasis are different.
Perplexity makes citations central. Its answers are typically built as a synthesis of named sources, and the citation count tends to be higher per answer than the other engines. This makes Perplexity particularly diagnostic for AEO work: a brand that cannot get cited in Perplexity is rarely cited elsewhere.
Grok shows a stronger pull toward live social signals than the others, which can make it volatile but also revealing. It cites things the other engines miss, particularly around emerging conversations.
DeepSeek is the newest of the group for many Western audiences, and its citation behaviour is still being mapped. Early observation suggests a preference for sources that are richly structured and easy to parse, with less weight on classical authority signals.
These are tendencies, not laws. They shift across model versions and across question categories. The honest answer to "which engine should we prioritise" is: measure first, decide second.
What this means for the work
Three useful conclusions fall out of the picture above.
The first is that AEO is part technical and part editorial. The technical part is making sure the brand is reachable, parseable and structured cleanly. The editorial part is making sure the content actually answers questions a real customer would ask, and is fresh enough to be preferred over an older equivalent.
The second is that off-domain signals matter more than they did in classical SEO. The pages and registers and review platforms that describe the brand do real work. Brands that have ignored their off-domain footprint pay for that neglect in citation share.
The third is that measurement is the only honest ground for any of these decisions. The engines change. The behaviour changes. The categories where AI usage is rising change. Without measurement, AEO work is opinion. With it, AEO becomes a practice.
Common questions
Can I make an AI engine cite my brand by adding a specific tag to my pages? No. There is no tag-level instruction that forces a citation. Structured data and clear authoring help, but they do not coerce.
Do citation patterns change between desktop and mobile? Yes, in some products, although the differences are small. The bigger differences are between engines and between question categories.
How often do citation patterns change? Detectably, every few weeks. Materially, every few months. Underlying model upgrades and silent ranking changes can move citation share quickly, particularly for borderline brands.
Does the model know if it is wrong about my brand? Sometimes. Models can carry stale facts with high confidence. The most reliable corrective is improving the off-domain footprint, because the next training run or retrieval pass will see the updated record.

