Everyone is talking about ranking in AI answers. Almost nobody is measuring it. So we built a tool that asks the major LLMs the same questions, thousands of times, and records exactly which brands they name back.
The premise was simple. If a customer types "best WordPress SEO agency" into ChatGPT instead of Google, the only thing that matters is whether your name comes out the other side. That's not a keyword you can track in Search Console — it's a probability distribution living inside a model. To manage it, we first had to measure it.
Polling LLMs like you're running a survey
An LLM answer isn't a fact lookup — it's a sample. Ask the same question twice and you can get two different lists of brands. Treat one response as truth and you're reading tea leaves. So we stopped thinking like SEOs and started thinking like pollsters.
For every query we care about, we fire the same prompt 30 to 50 times across each model, then aggregate. A brand mentioned in 4 out of 50 runs has an 8% "share of voice" for that question.
New here? See what SEO Insight actually does
We turn audits like this into ranking gains on real sites — technical fixes, Core Web Vitals, structured data and AI-search visibility, done for you.
Explore our servicesThe stack that made it work
We wanted one interface to call every provider. A thin SDK layer let us swap models with a single string and keep the polling logic identical across providers.
const models = ["gpt-4o", "claude-sonnet", "gemini-pro"];
for (const q of queries) {
for (const model of models) {
const runs = await pollMany(model, q.prompt, {
n: 50, // samples per query
schema: BrandList, // force structured output
});
record(q.id, model, tallyBrands(runs));
}
}The accuracy problem of structured output
We needed every answer as a clean list of brand names, not prose. So we asked each model to return JSON. That solved parsing — and quietly introduced a new bias.
Structure your prompts for the answer you want to measure, not the answer that's easiest to parse.
The economics: tokens aren't free
Fifty samples across 2,000 queries and four models is 400,000 calls per full run.
We cut the bill three ways: caching identical prompts within a run, dropping sample counts on stable queries, and reserving the expensive extraction pass for answers where the cheap parser disagreed with itself.
What this means for your site
- Be quotable. Write pages a model can lift a sentence from without hedging — concrete claims, clear definitions, real numbers.
- Earn third-party mentions. Models trust what other sources already say about you. Off-site citations are the new backlinks.
- Measure share of voice, not rank. Track how often you're named for your money queries, and watch the trend.
AI search isn't a different game so much as the same game with the scoreboard hidden. Build the scoreboard, and the strategy gets obvious fast.