Foundational public methodology for the first open public ranking of brand visibility in AI search results (ChatGPT, Perplexity, Gemini, Claude). This release establishes the framework — no rankings have been computed or published yet. First scan cycle: late May 2026 (private validation). First public ranking publication target: August 2026, after 3 validation cycles. Includes: - methodology.json: machine-readable formulas, weights, policies - README.md: human-readable overview + open/closed boundary - CHANGELOG.md: versioning policy + v1.0.0 release notes - taxonomy.md: tier system + 11 PL pilot categories - LICENSE: MIT - .gitignore: closed operational data (exact prompts, anti-gaming thresholds) - prompts/README.md: 6-stage prompt curation process - prompts/example-swiece-sojowe-pl.md: illustrative framework for first category Strategic principles: - Algorithm-first, no advisory board - Open methodology + closed exact prompts (Goodhart's Law defense) - No retroactive changes (FIDE 2024 lesson) - No pay-to-play, hard rule (Moody's / Forbes 30 Under 30 lessons) - Subjective opinion disclaimer (Gartner v. NetScout 2020 First Amendment shield) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
172 lines
8 KiB
Markdown
172 lines
8 KiB
Markdown
# Prompt Curation Process
|
||
|
||
> How Citee Index builds and validates the prompt pool per category. The 6-stage process that prevents the "garbage in, garbage out" failure mode.
|
||
|
||
---
|
||
|
||
## Why this matters
|
||
|
||
If the prompt pool is junk ("dyfuzory do włosów ranking", "wąski do samochodu"), the ranking is junk. Prompt quality is the single most important upstream input to ranking integrity.
|
||
|
||
This process exists to ensure every prompt in the active pool meets two tests:
|
||
|
||
1. **Real buyer test** — would an actual buyer of this category type this query into ChatGPT/Perplexity?
|
||
2. **Reality check** — does this query appear in actual search/discussion data (Google Trends, Reddit, Quora)?
|
||
|
||
Prompts failing either test are excluded.
|
||
|
||
## The 6 stages
|
||
|
||
```
|
||
Stage 1: Persona Generator (AI)
|
||
↓ 5–10 buyer personas per category
|
||
Stage 2: Prompt Brainstormer (AI per persona)
|
||
↓ 200–300 raw prompts
|
||
Stage 3: Reality Check (Google Trends / Reddit / Quora / AnswerThePublic)
|
||
↓ ~150 prompts with verified search demand
|
||
Stage 4: Multi-agent Validation (3 critic agents in parallel)
|
||
↓ ~120 prompts after critique
|
||
Stage 5: Pilot Test Run (10-prompt sample × 3 models)
|
||
↓ ~110 prompts that produce stable, sensible AI outputs
|
||
Stage 6: Human Approval (founder + category expert)
|
||
↓ FINAL POOL: 100 prompts
|
||
```
|
||
|
||
### Stage 1 — Persona Generator
|
||
|
||
Claude generates 5–10 buyer personas per category. Each persona has:
|
||
- Demographics (age, location, income bracket)
|
||
- Pain points (what they're trying to solve)
|
||
- Decision factors (price, ingredients, brand, reviews, certifications)
|
||
- Vocabulary (how they actually talk — formal vs colloquial, technical vs lay)
|
||
|
||
Example for Świece sojowe PL:
|
||
- "30+ kobieta kupująca prezent dla mamy"
|
||
- "Self-care millennial 25–35 po pracy"
|
||
- "Wnętrzarz minimalistyczne mieszkanie"
|
||
- "Mężczyzna kupujący prezent walentynkowy"
|
||
- "Mama małych dzieci szukająca bezpiecznego zapachu"
|
||
|
||
### Stage 2 — Prompt Brainstormer
|
||
|
||
For each persona, Claude generates 30–50 prompts in the voice of that persona — "how would I phrase this question to ChatGPT?" Total per category: ~200–300 raw prompts.
|
||
|
||
Distribution target by type (enforced at this stage):
|
||
- Buying intent (weight 2.0): 30%
|
||
- Comparison (weight 1.5): 25%
|
||
- Specific need (weight 1.5): 20%
|
||
- Informational (weight 0.3): 15%
|
||
- Brand-direct (weight 0.3): 10%
|
||
|
||
### Stage 3 — Reality Check
|
||
|
||
Each prompt cross-referenced against real-world data:
|
||
|
||
| Source | Method | Threshold |
|
||
|---|---|---|
|
||
| **Google Trends API** | PL queries past 12 months | minimum search volume present |
|
||
| **Google Search Console** (where available) | Real search queries to brand sites we have access to | inspirational source for vocabulary |
|
||
| **Reddit search** | r/Polska_Marka, niche subreddits | actual user phrasing |
|
||
| **Quora PL** | Questions asked in category | real curiosity patterns |
|
||
| **AnswerThePublic** | Public scraping of "people also ask" | discovery of long-tail patterns |
|
||
| **People Also Ask (Google)** | For top category queries | semantic neighbors |
|
||
|
||
Prompts with zero/marginal real-world signal are removed. ~300 → ~150.
|
||
|
||
### Stage 4 — Multi-agent Validation
|
||
|
||
Three AI critic agents review the list in parallel:
|
||
|
||
**Agent A — "Real buyer critique"**
|
||
Persona-grounded review. Each persona "reads" the prompts and flags ones that don't sound natural for that persona. Prompts marked unnatural by 2+ personas are removed.
|
||
|
||
**Agent B — "Methodology critic"**
|
||
Statistical and structural review. Checks:
|
||
- Prompt type distribution stays within ±5% of target
|
||
- No subcategory over/under-represented
|
||
- Vocabulary diversity (we're not repeating the same phrasing)
|
||
- Length distribution reasonable (no 50-word prompts, no 2-word prompts)
|
||
|
||
**Agent C — "Vendor exploit hunter"**
|
||
Anti-gaming review. Identifies prompts that are too easy to game by content marketing fluff:
|
||
- Generic informational queries that any vendor can write a blog post for
|
||
- Prompts where AI answer is dominated by Wikipedia (vendor can edit Wikipedia)
|
||
- Prompts where answer comes from one Reddit post (vendor can write that post)
|
||
|
||
Each agent produces a list of flagged prompts. Anything flagged by 2+ agents is removed. ~150 → ~120.
|
||
|
||
### Stage 5 — Pilot Test Run
|
||
|
||
The ~120 candidate prompts get a sample test:
|
||
- Pick 10 prompts (stratified across types)
|
||
- Run on ChatGPT-search, Perplexity Sonar, Gemini Pro
|
||
- Each prompt × 3 models = 30 outputs
|
||
|
||
**Reject criteria:**
|
||
- AI returns "I don't know" or "this depends on your preferences" (no actionable brand mentions)
|
||
- Outputs across 3 models have zero overlap (prompt produces incoherent/random results)
|
||
- AI returns a list of countries/categories instead of brands (prompt was misinterpreted)
|
||
|
||
Prompts failing pilot are flagged for revision or removal. ~120 → ~110.
|
||
|
||
### Stage 6 — Human Approval
|
||
|
||
The founder + category expert review the final ~110 candidates and select the production 100.
|
||
|
||
**Founder always reviews.** For categories outside founder's domain knowledge, a paid expert reviewer (1–2 hours, $50–100) is engaged:
|
||
|
||
| Category | Expert profile |
|
||
|---|---|
|
||
| Kosmetyki naturalne | Beauty product manager / freelance marketer |
|
||
| Suplementy / nutricosmetyki | Nutritionist / DTC supplement marketer |
|
||
| Diety pudełkowe | Fitness coach / dietitian |
|
||
| Premium pet food | Pet specialty store owner / dog trainer |
|
||
| Kawa specialty | Coffee blogger / barista trainer |
|
||
| Czekolada rzemieślnicza | Food blogger / chocolate-focused content creator |
|
||
| Kursy programowania | Bootcamp graduate / hiring manager |
|
||
| Kliniki estetyczne | Dermatologist or aesthetic medicine consultant |
|
||
| Fitness studios | Personal trainer / gym manager |
|
||
| Kosmetyki męskie | Men's grooming influencer / DTC marketer |
|
||
| Świece sojowe | Founder + JAKULO customer service data |
|
||
|
||
The final 100 prompts are committed to the closed `prompts/{slug}/` directory (gitignored). A public example framework is committed to `prompts/example-{slug}.md` (this repo) showing the structure and 5–10 illustrative examples per type — but **not the exact production strings**.
|
||
|
||
## Quarterly refresh — 20% rotation
|
||
|
||
Every quarter, the curation pipeline runs in refresh mode:
|
||
|
||
1. **Trend check** — Google Trends API: which prompts have lost relative search volume?
|
||
2. **New patterns** — Reddit/Quora scrape: what new question patterns have emerged?
|
||
3. **New entrants** — scan model outputs from past quarter: what brands appeared in answers but aren't in our brand catalog?
|
||
4. **Generate replacements** — Stages 1–5 for the rotation set
|
||
5. **Human approval** — founder reviews the proposed 20 swaps in 5–10 minutes
|
||
|
||
This prevents Goodhart's Law: as the prompt pool becomes known to vendors (through reverse-engineering or leaks), 20% rotation per quarter ensures vendors can't permanently optimize against our exact queries.
|
||
|
||
## Cost per category
|
||
|
||
| Stage | API cost | Human cost |
|
||
|---|---|---|
|
||
| 1 — Persona Generator | ~$0.50 (Claude) | — |
|
||
| 2 — Prompt Brainstormer | ~$1.50 (Claude) | — |
|
||
| 3 — Reality Check | $0 (free APIs) | — |
|
||
| 4 — Multi-agent Validation | ~$3 (Claude × 3 critics) | — |
|
||
| 5 — Pilot Test Run | ~$5 (3 models × 30 outputs) | — |
|
||
| 6 — Human Approval | — | ~30 min founder + 1–2h expert ($50–100 for non-founder categories) |
|
||
| **Total per category** | **~$10** | **~30 min + $50–100 for expert categories** |
|
||
|
||
For 11 pilot categories: ~$110 API + ~5 hours founder time + ~$500 expert reviewers.
|
||
|
||
## Quarterly refresh cost
|
||
|
||
Per category per quarter: ~$3 API + 5 minutes founder review.
|
||
|
||
For 11 categories: ~$35 API + 1 hour founder time per quarter.
|
||
|
||
## Why this is published openly
|
||
|
||
We publish the **process** because the integrity of the ranking depends on the integrity of the prompts, and external review of the process is the strongest defense against "your prompts are garbage" attack.
|
||
|
||
We do NOT publish the **exact strings** because Goodhart's Law: known prompts get optimized against, ceasing to measure organic AI search behavior.
|
||
|
||
The boundary between "open process" and "closed strings" is itself documented openly.
|