- ChatGPT does not cite every page it retrieves. It uses a short list of clues (title, snippet, URL, source type) to decide what to open and what to credit.
- If you want more citations from ChatGPT, you need to rank in search, match its hidden sub-queries, and write clear, natural page titles and slugs.
- Reddit, YouTube, and news are pulled heavily, but they do not all get credit at the same rate. Some sources act more like training wheels than final references.
- Fresh content helps, but relevance and authority still win. Older, well aligned pages often beat new ones inside the same result set.
If you just want the quick answer: ChatGPT tends to cite pages that rank well in search, match the exact language of its internal sub-queries, have clear titles and human readable URLs, and come from its preferred source types. It may retrieve dozens of URLs for a single prompt, but only the ones that look the most relevant and trustworthy on the surface ever get opened, read, and credited.
Why some pages get cited by ChatGPT and others do not
Let me start with the thing most people miss: ChatGPT does not read the entire web every time you ask a question. It starts with a tiny snapshot of candidate URLs and judges those pages based on a few quick fields before it even sees your full content.
Think of it as a two step funnel. First, the retrieval system grabs URLs and lightweight metadata. Second, a ranking step decides which ones are worth opening and which of those deserve a citation. If you fail in step one, your amazing content never even gets into the conversation.
I know that sounds a bit harsh, but it matches what I keep seeing when I test this with clients. Pages that look great to humans often look vague or off topic when you only look at the title, snippet, and URL. And that is exactly what the model is seeing at first glance.

How ChatGPT actually gathers and filters sources
The hidden gatekeeper: retrieval data
Before ChatGPT cites a URL, it needs a quick summary of that page without reading the full content. That summary is what researchers often call retrieval data.
In practice, each candidate result arrives with four core elements: a title, a short snippet, the URL, and some kind of internal ID or metadata tag. The system uses this bundle to decide what is worth a deeper look.
ChatGPT does not start by reading your article. It starts by judging your title, your snippet, and your URL.
That means your on page SEO is only half the story. The other half sits in those small pieces of text that models see first, long before they read your H2s or your examples.
Retrieval channels: not every URL enters through the same door
This part can be a bit confusing, but it matters a lot. ChatGPT does not treat every source the same. It pulls content through different internal channels, each labeled with something like a ref_type.
You will usually see at least these buckets in research and reverse engineering work:
- Search: general web search results, similar to what you see on a search engine
- News: recent articles from news publishers and large blogs
- Forums / Reddit style content: social discussions pulled from large communities
- Video platforms: data tied to platforms such as YouTube
- Academic or reference: papers, standards, reference docs
Different studies disagree on the exact numbers, and I think anyone who pretends to know the exact mix for every model version is stretching the truth a bit. But across tests, one pattern holds up pretty well.
The general search index supplies most of the URLs that get cited. If you want steady citations, you need to win in regular search first.
That is not very glamorous. It means the boring SEO work still matters in an AI world. Rankings, crawlability, internal links, all of it.
Why some channels are used but rarely credited
Here is where things get interesting. When you profile traffic at scale, some sources show up heavily in the retrieved set but barely appear in the citations.
Forum content and big discussion platforms are a good example. The model seems to pull a lot of them to figure out common opinions, real language, and edge cases, then it ends up citing a more traditional article or documentation page instead.
In a few of my own tests, I saw something slightly different from some public studies. For technical queries, YouTube channels with strong transcripts and clear titles were cited more often than I expected. For general product questions, video URLs almost never appeared as citations, even though I could see those videos in the network logs.
I do not think there is a single truth about which ref_type wins every time. The better reading is this: ChatGPT prefers whatever channel the system trusts most for that topic and format. For how to code a function, video might matter more. For legal details, it tends to pick reference docs.
The volume trap: why global numbers can mislead you
When you look at millions of prompts, it is easy to mix up volume and importance. A ref_type can look dominant in the data because it has a high count, not because it wins many actual citations.
Reddit style content is a great example. You can see a flood of those URLs in the raw retrieval logs, but most of them never show up in the final answers. They shape the answer, they just do not get credit on the screen.
From a brand point of view, that means your name can influence AI responses long before it ever appears in a citation.
If your product is mentioned a lot in public threads, you might see ChatGPT recommend it without linking you. It is a strange kind of visibility. Good for demand, less good for direct traffic.
What this means for your content strategy
So where does this leave you if you care about organic traffic from AI tools, not just traditional search engines? I think you need to think in layers.
- Layer 1: pages that rank in general search and look citation ready
- Layer 2: discussions, reviews, and community content that feed the model’s understanding
- Layer 3: news and timely updates that help you win on recency for time sensitive topics
You cannot control every ref_type, and that is fine. But you can decide which layers you want to compete in and how you present your content so that the retrieval gatekeeper likes you more than it likes the next result.

Semantic similarity: how matching language drives citations
From your prompt to ChatGPT’s internal questions
When you type a query like “best CRM tools for a 5 person sales team”, ChatGPT does not run a single web search for that exact string. It breaks your request into several sub-questions, often called fanout queries.
Those fanout queries are much more specific. For example, it might search for “CRM for small sales teams pricing comparison” or “pipeline tracking tool for B2B startup” and “top rated simple CRM for beginners”.
Each of those sub-queries grabs its own list of URLs. The model then has to decide which of those URLs to open and which ones to cite when it answers your original question.
Why title relevance matters more than people think
From what we see across different studies, the semantic closeness between the page title and those fanout queries is a strong predictor of whether a page will be cited. Not perfect, but strong.
That means the old advice about putting your keyword in the title is still correct, but it is not enough. The title also has to read like something a user would search and like something an AI would generate as a sub-question.
If your title could double as a natural language search, your chance of being cited goes up.
For example, compare these two titles for a B2B CRM guide:
| Title | Likely AI reaction |
|---|---|
| “SynergyPro: A Comprehensive Overview of Our Sales Enablement Suite” | Looks branded and vague. Hard to match to a specific fanout query. |
| “Simple CRM tools for 3 to 10 person B2B sales teams” | Matches common sub-queries for team size and use case quite well. |
The second title may feel less “marketing friendly” to some teams, but the model will understand it faster. It also lines up much better with how real users describe their problem.
Fanout queries vs the original prompt
One subtle point many people miss: being close to the original user prompt is helpful, but being close to the internal fanout queries seems even more predictive of citations.
I have seen pages that match the seed query quite well, yet still lose out because they do not answer the narrower sub-questions the model cares about: pricing, region, platform, or risk.
So instead of asking only “what keyword does my page target”, ask “what smaller questions would an AI break this into, and do I cover those clearly in my title, headings, and URL?”.
Human readable URLs and citation likelihood
On top of titles, the format of your URL also plays a role. Studies that looked at large prompt sets found that URLs with clear, readable slugs tend to get cited more often than opaque ones with random IDs or long tracking strings.
Think of the difference between these two:
| URL type | Example | AI friendly? |
|---|---|---|
| Opaque | https://site.com/blog/article?id=73928&ref=hp | Harder to relate to a topic, looks less clean in a citation. |
| Natural language | https://site.com/crm-tools/small-sales-team | Signals topic clearly, fits well with semantic scoring. |
Is this the biggest factor? No. But when titles, snippets, and URLs all point in the same semantic direction, the model has less doubt. And that lack of doubt can push your page into the cited group instead of the ignored group.
How to see fanout queries yourself
One thing I like about working with developers is that they usually discover small tricks before the rest of the SEO world does. Inspecting network calls in ChatGPT is one of those tricks.
You can open the browser dev tools, run a prompt, filter requests, and look for payloads that include a list of “queries” or something similar. Those strings are your fanout queries. They show you the actual search phrases the model used behind the scenes.
The process changes sometimes when the interface updates, so I will not pretend the exact steps are always the same. But the core idea holds: watch the network traffic, grab the queries, and map them to your content.
Once you see the fanout queries, you stop guessing what AI wants and start mirroring the language it already uses.
Here is a simple way to use that data without getting lost in it:
- Take one AI answer where your competitor is cited but you are not
- Extract the internal queries for that answer
- Group them by intent: how to, comparison, troubleshooting, buyer questions
- Check which groups your page covers and which it barely mentions
- Update your headings, FAQ, and examples to address the missing groups in plain language
I know this sounds almost too straightforward, but I have seen pages go from zero citations to regular mentions in a few weeks just from aligning titles, slugs, and subheadings with those internal queries.

Freshness vs authority: why age behaves strangely in AI citations
Yes, fresh content helps, but not in a simple way
There is a common belief right now that AI models always prefer the newest article. I do not fully agree with that. The data I have seen, and what your competitor’s source reports, suggests a more mixed picture.
Across many queries, cited pages do tend to be younger than what you see in classic search results. So if you publish nothing for three or four years, you are likely missing some opportunities. But that is only one side of the story.
Inside one prompt: older pages often win
When you zoom in on a single prompt and look at all the URLs retrieved, a different pattern sometimes appears. The newer pages are present in the pool, but the pages that get cited are often the more established ones that have existed for a few years.
That seems odd at first. Why would a model that favors fresh content globally still pick older content locally within a single result set?
My reading is that freshness is a tie breaker, not the main driver, in most verticals. Relevance and authority do the heavy lifting. Only when two pages look equally relevant and trustworthy does recency decide the winner.
When news behaves differently
Time sensitive topics behave in a clearer way. For news queries, the difference in semantic relevance between cited and non cited pages is often quite small. The model sees many pages that look equally on topic, so it has to pick another variable to separate them.
In those cases, age becomes a bigger factor. A news piece from this morning will push out a similar article from last week. Not always, but often enough that the pattern shows up in larger datasets.
So if your niche is tied strongly to breaking updates, you cannot rely on evergreen guides alone. You still need those, but you also need a rhythm of timely posts that match news queries and event driven searches.
What this means for content planning
If you want to show up in AI answers more often, you might structure your content library into three buckets.
- Evergreen explainers: high relevance, broad coverage, updated every year or so
- Fresh takes: commentary or breakdowns that respond to new releases, policy changes, or trends
- Reference pages: documentation, product pages, or detailed FAQs that rarely change but build authority
You do not need a huge volume in each bucket. A few strong evergreen pages can carry a lot of weight if they map tightly to fanout queries. But if you never publish fresh pieces, you are limiting your chances in categories where recency is the deciding factor.
Relevance gets you into the short list. Authority keeps you there. Freshness helps when everything else looks equal.
This is not that different from how I used to think about Google. The only real shift is that now your content is being judged both as a search result and as a building block for synthetic answers.

How to make your pages more “citable” for ChatGPT
1. Start with search visibility
I know this sounds a bit boring, but you cannot skip it. If your pages do not appear in the general search index for the queries AI tools use, they will rarely show up in retrieval sets, which means no citations.
So your first job is still classic SEO work:
- Cover topics your buyers care about, not only brand terms
- Use clear, descriptive titles and meta descriptions
- Fix crawl issues and broken internal links
- Earn links from relevant sites in your space
There is no shortcut around this. Some people hope that if they just mention “ChatGPT” in their content, they will bypass ranking. That does not happen in any reliable way.
2. Match titles, headings, and slugs to fanout queries
Once you are in the search pool, you can focus on alignment. The idea is simple: you want your page to look like a perfect answer to the internal queries the model asks behind the scenes.
Here is a basic workflow you can follow without any special tools:
- Ask ChatGPT a question where you want to be cited
- Look at the answer and list the sub-topics it covers
- Guess the likely fanout queries from those sub-topics using normal language
- Turn the strongest sub-topics into H2 or H3 headings on your page
- Adjust your title and URL to match the main fanout query more closely
If you want to be a bit more precise, you can inspect network calls or use third-party tools that try to surface those internal queries. I do not think they are perfect, but they give you a good starting point.
3. Rewrite your titles in natural language
Many SaaS and B2B sites still write titles for internal teams, not for users or AI tools. They use generic labels like “Solutions” or “Platform overview” that tell you almost nothing about the content.
Instead, try titles like these:
- “Pricing analytics for consumer subscription apps”
- “Email warmup guide for small sales teams”
- “Security checklist for finance startups using cloud tools”
Each one of these is something you could imagine typing into a search box. They also map neatly to the kind of fanout queries models generate, such as “email warmup small team” or “security checklist cloud finance”.
4. Use readable slugs and clean URLs
This part is quick but helpful. When you ship a new piece, check the URL slug and ask a simple question: if I only saw this slug, would I know the topic?
Compare:
| Bad slug | Better slug |
|---|---|
| /blog/post-17 | /blog/b2b-email-warmup |
| /resources/guide2026_final | /resources/startup-security-checklist |
There is nothing magical about this. You are just making it easier for both humans and models to understand the topic from a small piece of text.
5. Cover the questions AI actually cares about
Even if your titles and URLs look good, you will lose citations if your content skips the angles AI is using to answer prompts. That often means missing sections on pricing, limitations, risk, or comparisons.
One way to fix this is to create a simple checklist for every important page:
- Basics: what is the topic, who is it for
- Benefits and tradeoffs: when this works, when it does not
- Steps: how to do it, with clear bullet lists
- Comparisons: how it differs from common alternatives
- Proof: small examples, data points, or screenshots
This structure mirrors the shape of typical AI answers. Pages that follow it are easier for models to mine and cite because they map directly to the layout of a helpful response.
6. Build pages that sell and still get cited
One thing I like in your competitor’s example is the focus on product and landing pages. I agree that those can perform very well in AI citations when they are built with user intent in mind instead of just brand copy.
Where I disagree a bit is with the idea that a course or template alone is enough to fix this. Templates help, but you still need to think carefully about the actual search intents you target.
Good “citable” sales pages usually have:
- Intent focused titles like “SEO reporting tool for agencies”
- Sections that answer how it works, who it is for, and what it costs
- Simple, consistent language that matches search queries
- Examples of real use cases instead of abstract pitches
That way the page can appear both as a sales asset and as a reference in AI answers for topics such as “how agencies track SEO results” or “tools for client SEO reporting”.
7. Use data and experiments instead of assumptions
One risk right now is that people anchor too hard on any single study. The Ahrefs work your competitor talked about is helpful, but it is based on one timeframe, one model version, and one sampling method.
I would treat it as a strong clue, not as a final map. The way I handle this is to copy the high level patterns into client tests, then check if the results match in their niche.
Do not just copy rules from one study. Use them as hypotheses, then see how your own content responds across a few dozen prompts.
This takes more time, but it keeps you from overfitting to someone else’s dataset that may already be slightly out of date.

Practical workflow: from zero data to real ChatGPT citations
Step 1: Pick a small set of target prompts
Start with 5 to 10 prompts where you really care about visibility. These should mirror how your best customers think, not just how you wish they searched.
For example:
- “tools to measure product adoption in SaaS”
- “SEO for local service businesses with small budgets”
- “how to onboard new sales reps in 30 days”
You can refine these as you learn more, but you need a starting point.
Step 2: Observe which pages ChatGPT already cites
Run those prompts through ChatGPT with web access on. Note the URLs it cites right now. Do this several times, because answers can vary slightly between runs.
Then, for each cited page, write down:
- Exact page title
- Slug and URL structure
- Type of content (blog, docs, product page, comparison, forum)
- Key sections and headings
You are not trying to copy these pages. You are trying to understand why the model trusts them for this topic.
Step 3: Compare those pages against your own
Now line up your pages that target similar intents. Ask a few honest questions:
- Is your title as clear as the ones that are cited?
- Does your page cover as many of the sub-questions as theirs does?
- Is your URL readable and on topic?
- Do you go into the same or greater depth on practical steps?
If the answer is no in several places, that gives you a roadmap to fix things. You do not need a fancy content score to see obvious gaps.
Step 4: Update one page at a time
Instead of rewriting your whole site, pick one high intent page and run a focused experiment. Clean up the title, slug, and headings. Add missing sections that match the sub-questions you saw in AI answers.
Then watch what happens over the next few weeks:
- Does your organic ranking for related search terms improve or stay the same?
- Does ChatGPT start to include your page as a citation on some runs?
- Do you see more branded searches or referral traffic from AI interfaces?
You may not see instant changes, but patterns do show up over time, especially if you log results carefully.
Step 5: Expand to product and landing pages
Once you know your tweaks work on informational content, extend them to your product and landing pages. This is where the real business impact often lives.
Here you might:
- Rename vague pages like “/solutions” to something like “/solutions/b2b-saas-analytics”
- Add FAQ sections that answer common buyer questions in plain language
- Include comparison blocks like “[Product] vs spreadsheets” or “[Product] vs generic tools”
- Make sure your titles read like what users search for, not internal project names
In my experience, this feels a little uncomfortable at first. Marketing teams worry that titles will sound too plain. But again and again, those plain, direct titles are the ones that earn both search traffic and AI citations.
Step 6: Keep testing, because models change
One last thought. Models change, and so do their citation habits. What works well this quarter might fade a bit next year. That is not a flaw, it is just how the space evolves.
The way to handle that is not to chase every rumor, but to keep a simple, repeatable process:
- Check your key prompts every month or two
- Log which URLs are cited and how often yours appears
- Look for new patterns in titles, formats, and content types
- Adjust a small set of pages, then measure again
The goal is not to outsmart ChatGPT. The goal is to keep your content aligned with what real users ask and what the model actually surfaces.
If you do that consistently, you move from guessing about “AI SEO” to running a clear, testable system. It is not perfect, and it will never be finished, but that is how real marketing usually works.
Where to focus next
If you feel a bit overwhelmed, I would not blame you. There is a lot here: retrieval channels, semantic similarity, page age, fanout queries, and all the rest.
So if you want a simple starting point, here is the order I would pick:
- Fix your titles and slugs so they read like natural search queries.
- Rewrite one or two key pages to cover the sub-questions ChatGPT already answers.
- Pay special attention to product and landing pages, not only blog posts.
- Check your target prompts monthly and keep a lightweight log of citations.
You will not control everything that happens inside these models. No one does, no matter how confident they sound. But you can make your content much easier to retrieve, interpret, and cite. And in a world where AI tools act like the new front page, that edge compounds over time.
Need a quick summary of this article? Choose your favorite AI tool below:


