Measurement

How to Track AI Citations

You cannot improve what you cannot see. This is the practical guide to measuring how often ChatGPT, Claude, Perplexity and Gemini are naming your brand, what it means, and what to do with the data.

Last updated: May 2026

The measurement problem

Traditional SEO tooling is built around a search engine results page that is the same for every user on a given query. Two analysts in two countries running the same Ahrefs report on a given keyword see roughly the same data, because the underlying source of truth, the SERP, is broadly stable. AI assistants are not like that.

When a user asks Claude or ChatGPT a question, the answer is generated on the fly, weighted by context, recency, prior conversation and a degree of non-determinism. Two identical prompts thirty seconds apart can return different recommendations. That makes measurement harder than it was in the SEO era, but not impossible. You measure by sampling at a steady cadence against a defined set of prompts, not by checking once and treating the answer as a permanent ranking.

Our guide to measuring GEO ROI covers the broader commercial framing. This page is about the practical mechanics underneath it.

What you are measuring

An AI citation is any moment the model names your brand in its answer. There are four shapes of citation worth distinguishing, because each has a different value:

Direct recommendation. The model lists your business inside a shortlist when the user asks for one. This is the highest-value citation because the user is in a buying mindset and the model is doing the qualifying work for you.

Worked example. The model uses your business as an illustration when explaining a category or concept, for example "Allbirds is a good example of a direct-to-consumer brand that built early authority through sustainability content". These citations build category-leader perception even when the user is not actively buying.

Quoted source. The model uses a statistic, definition or framing that comes from your published content. Perplexity makes this visible with citation footnotes; ChatGPT and Claude rarely do, but the pattern still shows up in answers.

Negative or neutral mention. The model names you as a context, for example "if you are not happy with X, alternatives include Y and Z". These can be useful or harmful depending on whether you are X or Y.

Building a prompt panel

A prompt panel is a list of 20 to 50 questions, chosen to mirror what real customers ask AI assistants when they are looking for what you sell. It is the foundation of every credible GEO measurement programme. Most teams underbuild this and then complain that their tracking is noisy. Spend time getting it right.

A good panel mixes four kinds of prompt. Buying-intent prompts use phrases like "best", "top", "compare", "alternatives to". Use-case prompts describe the situation a real buyer is in, for example "we need a CRM for a five-person consultancy that integrates with Xero". Comparison prompts pit two named competitors against each other and watch whether the model brings you into the conversation. Awareness prompts ask broader category questions where being mentioned at all signals authority, even without an explicit recommendation.

Once the panel is built it should change slowly. Adding or rewording prompts every week breaks the trend line. Aim to refresh it once a quarter.

Use the same prompts every week, the same week of the month if possible

Model output changes over time as training data updates and weights shift. The only way to detect that drift is to hold your prompts constant. A weekly cadence on a Monday morning, with the same panel each week, gives you a clean trend line over a quarter and is sensitive enough to spot a sudden drop.

Score citations in three bands, not as a yes or no

Three bands give you a far richer signal than a binary present-or-absent score. The bands that work in practice are: Position 1 (named first or prominently), Position 2 or 3 (named in a shortlist but not lead), and Mentioned (referenced in passing). Over time the share of your citations that sit in Position 1 is one of the best leading indicators of revenue from AI channels.

Track competitors at the same time

Your absolute citation rate matters less than your share against the two or three competitors you actually lose deals to. Run the same prompt panel and record which competitors are named on each. Over a quarter, watching your share grow while a specific competitor's share shrinks tells you that your GEO work is moving real ground.

Tools and workflows

There are three layers of tooling worth understanding. The first is dedicated GEO platforms that query AI models on a schedule and store the results, which is what the AI Visibility Audit on Agent Console HQ does. The second is general-purpose scripting where a team uses the official OpenAI, Anthropic and Perplexity APIs to run their own prompt panel weekly and log the results into a database or sheet. The third is manual sampling for teams just starting out, where one person spends thirty minutes a week running the panel by hand and recording the outcomes.

Manual sampling is fine to begin with, and it builds intuition. Most teams move to scripted or platform tracking once they realise they want the longer trend line and the time back. The choice between scripting and a platform usually comes down to whether the team has engineering capacity and whether the underlying data needs to feed other systems.

What the data tells you

The first month of data tells you almost nothing. The second month tells you which prompts you should be cited on but are not, which is a content brief in itself. The third month starts to show whether actions you have taken (publishing a comparison page, fixing schema markup, building authority on a review platform) are moving citation share. By the sixth month you have a defensible measurement model that you can put in front of a board.

The single most useful question to ask of the data each month is: which prompts where competitors are cited and we are not are the ones we are closest to winning? That short list, two or three prompts at a time, becomes the brief for the content and authority work that will lift citation share over the following quarter.

The teams that win at AI citation tracking are not the ones with the fanciest tooling. They are the ones who build a tight prompt panel, run it the same way every week, and act on the same two or three findings each month. Boring discipline beats clever tooling every quarter.

Where to start

Pick fifteen prompts that match what your real customers ask AI before they buy. Run them through ChatGPT, Claude and Perplexity manually, this week. Score whether your business is named, where in the answer it appears, and which competitors are named. That single hour of work tells you more about your AI visibility than any agency deck will. From there you can decide whether to scale up to a scripted weekly run or use a tool. If you want the underlying mechanics, how GEO works and how AI recommends businesses are the most useful next reads.

Want a 15-prompt baseline in 30 seconds?

The free AI visibility check runs a starter panel for you and shows where you stand against your category.

Run a Free AI Visibility Check