What is an AI citation and why does it matter?

An AI citation is any moment when ChatGPT, Claude, Perplexity, Gemini or Grok names your business in its answer to a user. It might be a recommendation in a shortlist, a worked example, a quoted statistic, or a link in a Perplexity-style citation footnote. It matters because for a growing share of consumer and B2B buyers, the AI answer is the only research they do. Being named in those answers replaces the role search rankings used to play.

Can you track AI citations with Google Analytics?

Partially. Most AI traffic arrives as direct visits or referrals with no UTM tags, so it shows up as 'Direct' in Google Analytics 4. You can infer it by looking at branded direct traffic growth, branded search lift over time, and referrer reports for chatgpt.com, perplexity.ai, claude.ai and gemini.google.com when the user clicks a link inside the AI surface. Dedicated GEO tooling fills the gap by querying AI models directly and recording when your brand is named.

How often should you run AI citation checks?

Weekly is the sweet spot for active monitoring of a defined prompt set. AI model output is non-deterministic, so a single check on a single day will not give a reliable picture. Run the same prompts each week, score whether your brand appears, log the result, and look at the trend over a rolling four-week window. Monthly is the minimum cadence to detect drift before it costs you revenue.

Which AI platforms should you track?

At minimum ChatGPT, Claude and Perplexity. ChatGPT dominates consumer use and is the default assistant in many workflows. Claude has strong adoption in B2B and professional services. Perplexity is the closest to a search-engine replacement because it always cites its sources. Gemini matters for anyone whose audience lives inside Google Workspace, and Grok is becoming relevant for X-heavy audiences. Five platforms is the practical maximum for most businesses; tracking more dilutes attention without adding signal.

How to Track AI Citations | Monitor ChatGPT and Claude Mentions of Your Brand

The measurement problem

Traditional SEO tooling is built around a search engine results page that is the same for every user on a given query. Two analysts in two countries running the same Ahrefs report on a given keyword see roughly the same data, because the underlying source of truth, the SERP, is broadly stable. AI assistants are not like that.

When a user asks Claude or ChatGPT a question, the answer is generated on the fly, weighted by context, recency, prior conversation and a degree of non-determinism. Two identical prompts thirty seconds apart can return different recommendations. That makes measurement harder than it was in the SEO era, but not impossible. You measure by sampling at a steady cadence against a defined set of prompts, not by checking once and treating the answer as a permanent ranking.

Our guide to measuring GEO ROI covers the broader commercial framing. This page is about the practical mechanics underneath it.

What you are measuring

An AI citation is any moment the model names your brand in its answer. There are four shapes of citation worth distinguishing, because each has a different value:

Direct recommendation. The model lists your business inside a shortlist when the user asks for one. This is the highest-value citation because the user is in a buying mindset and the model is doing the qualifying work for you.

Worked example. The model uses your business as an illustration when explaining a category or concept, for example "Allbirds is a good example of a direct-to-consumer brand that built early authority through sustainability content". These citations build category-leader perception even when the user is not actively buying.

Quoted source. The model uses a statistic, definition or framing that comes from your published content. Perplexity makes this visible with citation footnotes; ChatGPT and Claude rarely do, but the pattern still shows up in answers.

Negative or neutral mention. The model names you as a context, for example "if you are not happy with X, alternatives include Y and Z". These can be useful or harmful depending on whether you are X or Y.

Building a prompt panel

A prompt panel is a list of 20 to 50 questions, chosen to mirror what real customers ask AI assistants when they are looking for what you sell. It is the foundation of every credible GEO measurement programme. Most teams underbuild this and then complain that their tracking is noisy. Spend time getting it right.

A good panel mixes four kinds of prompt. Buying-intent prompts use phrases like "best", "top", "compare", "alternatives to". Use-case prompts describe the situation a real buyer is in, for example "we need a CRM for a five-person consultancy that integrates with Xero". Comparison prompts pit two named competitors against each other and watch whether the model brings you into the conversation. Awareness prompts ask broader category questions where being mentioned at all signals authority, even without an explicit recommendation.

Once the panel is built it should change slowly. Adding or rewording prompts every week breaks the trend line. Aim to refresh it once a quarter.

Use the same prompts every week, the same week of the month if possible

Model output changes over time as training data updates and weights shift. The only way to detect that drift is to hold your prompts constant. A weekly cadence on a Monday morning, with the same panel each week, gives you a clean trend line over a quarter and is sensitive enough to spot a sudden drop.

Score citations in three bands, not as a yes or no

Three bands give you a far richer signal than a binary present-or-absent score. The bands that work in practice are: Position 1 (named first or prominently), Position 2 or 3 (named in a shortlist but not lead), and Mentioned (referenced in passing). Over time the share of your citations that sit in Position 1 is one of the best leading indicators of revenue from AI channels.

Track competitors at the same time

Your absolute citation rate matters less than your share against the two or three competitors you actually lose deals to. Run the same prompt panel and record which competitors are named on each. Over a quarter, watching your share grow while a specific competitor's share shrinks tells you that your GEO work is moving real ground.

Tools and workflows

There are three layers of tooling worth understanding. The first is dedicated GEO platforms that query AI models on a schedule and store the results, which is what the AI Visibility Audit on Agent Console HQ does. The second is general-purpose scripting where a team uses the official OpenAI, Anthropic and Perplexity APIs to run their own prompt panel weekly and log the results into a database or sheet. The third is manual sampling for teams just starting out, where one person spends thirty minutes a week running the panel by hand and recording the outcomes.

Manual sampling is fine to begin with, and it builds intuition. Most teams move to scripted or platform tracking once they realise they want the longer trend line and the time back. The choice between scripting and a platform usually comes down to whether the team has engineering capacity and whether the underlying data needs to feed other systems.

What the data tells you

The first month of data tells you almost nothing. The second month tells you which prompts you should be cited on but are not, which is a content brief in itself. The third month starts to show whether actions you have taken (publishing a comparison page, fixing schema markup, building authority on a review platform) are moving citation share. By the sixth month you have a defensible measurement model that you can put in front of a board.

The single most useful question to ask of the data each month is: which prompts where competitors are cited and we are not are the ones we are closest to winning? That short list, two or three prompts at a time, becomes the brief for the content and authority work that will lift citation share over the following quarter.

The teams that win at AI citation tracking are not the ones with the fanciest tooling. They are the ones who build a tight prompt panel, run it the same way every week, and act on the same two or three findings each month. Boring discipline beats clever tooling every quarter.

Where to start

Pick fifteen prompts that match what your real customers ask AI before they buy. Run them through ChatGPT, Claude and Perplexity manually, this week. Score whether your business is named, where in the answer it appears, and which competitors are named. That single hour of work tells you more about your AI visibility than any agency deck will. From there you can decide whether to scale up to a scripted weekly run or use a tool. If you want the underlying mechanics, how GEO works and how AI recommends businesses are the most useful next reads.

How to Track AI Citations

The measurement problem

What you are measuring

Building a prompt panel

Use the same prompts every week, the same week of the month if possible

Score citations in three bands, not as a yes or no

Track competitors at the same time

Tools and workflows

What the data tells you

Where to start

Want a 15-prompt baseline in 30 seconds?