Open methodology for Citee Index — first public ranking of brand visibility in AI search

Find a file

Jacek Kubas 03a397343e Faza 1: brand catalog (świece sojowe PL) + prompt curation pipeline DATA — Public reference datasets for methodology: - data/README.md: schema + format definitions for brand catalogs - data/swiece-sojowe-pl/brand_catalog.json: 35 tracked brands (33 manufacturers + 2 importers) + 5 excluded marketplaces/resellers - data/swiece-sojowe-pl/brand_catalog.md: human-readable companion - data/swiece-sojowe-pl/market_metadata.json: GMV estimate, personas, seasonality, expected dynamics TOOLS — 6-stage prompt curation pipeline (Python 3.12+): - tools/prompt_curation/README.md: process documentation + cost estimates - tools/prompt_curation/config.py: tunable parameters per stage - tools/prompt_curation/.env.example: required API keys template - tools/prompt_curation/requirements.txt: dependencies - tools/prompt_curation/1_persona_generator.py: Claude generates 7 buyer personas - tools/prompt_curation/2_prompt_brainstormer.py: per persona × 30 prompts in voice - tools/prompt_curation/3_reality_checker.py: Google Trends + Reddit cross-check - tools/prompt_curation/4_validation_agents.py: 3 critic agents async (real_buyer/methodology/exploit_hunter) - tools/prompt_curation/5_pilot_test_runner.py: sample × 3 LLM models pre-flight - tools/prompt_curation/6_human_review_export.py: CSV export for founder approval - tools/prompt_curation/7_finalize.py: post-approval → closed prompts/{cat}/v{N}.json - tools/prompt_curation/pipeline.py: orchestrator (stages 1–6, then human review, then 7) GITIGNORE — Fixed .env.* exclusion to allow .env.example. This commit completes Faza 1. Stages outputs (data/{cat}/personas.json, raw_prompts.json, validated_prompts.json, critic_review.json, pilot_test_results.json, for_human_review.csv) are runtime artifacts — public when committed, derived from public methodology + public brand catalog. Final approved prompt strings in prompts/{cat}/v{N}.json remain CLOSED (gitignored, anti-Goodhart's Law). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>		2026-05-03 18:40:12 +02:00
data	Faza 1: brand catalog (świece sojowe PL) + prompt curation pipeline	2026-05-03 18:40:12 +02:00
prompts	v1.0.0 — initial Citee Index Methodology release	2026-05-03 17:25:56 +02:00
tools/prompt_curation	Faza 1: brand catalog (świece sojowe PL) + prompt curation pipeline	2026-05-03 18:40:12 +02:00
.gitignore	Faza 1: brand catalog (świece sojowe PL) + prompt curation pipeline	2026-05-03 18:40:12 +02:00
CHANGELOG.md	v1.0.0 — initial Citee Index Methodology release	2026-05-03 17:25:56 +02:00
LICENSE	v1.0.0 — initial Citee Index Methodology release	2026-05-03 17:25:56 +02:00
methodology.json	v1.0.0 — initial Citee Index Methodology release	2026-05-03 17:25:56 +02:00
README.md	v1.0.0 — initial Citee Index Methodology release	2026-05-03 17:25:56 +02:00
taxonomy.md	v1.0.0 — initial Citee Index Methodology release	2026-05-03 17:25:56 +02:00

README.md

Citee Index Methodology

Open methodology for the first public ranking of brand visibility in AI search results.

citee.ai · Methodology page · Forgejo · GitHub mirror

What this is

Citee Index measures how brands appear in AI-generated answers across major LLM-powered search systems (ChatGPT with web search, Perplexity, Gemini, Claude). The ranking is published quarterly per category and country.

This repository contains the complete public methodology — formulas, model weights, prompt-type distribution, cross-signal definitions, and the prompt curation process. Every change is committed publicly with rationale.

This is NOT:

A SaaS dashboard (that's Citee Pro, separate product)
A list of paid placements (zero pay-to-play, hard rule in methodology.json)
A static document — methodology evolves through versioned releases (see CHANGELOG.md)

Why open

Three reasons:

Reproducibility. Anyone can audit our scoring against the public raw query log.
Cryptographic timestamping. Git history is immutable — we cannot retroactively edit the methodology to hide a bug.
Subjective opinion shield. Open formula + public versioning establishes that scores are "expressions of opinion based on observed AI model outputs," not factual claims (legal precedent: Gartner v. NetScout, Connecticut Supreme Court 2020).

What's in this repo

File	Purpose
`methodology.json`	Machine-readable methodology — formulas, weights, thresholds, policies
`CHANGELOG.md`	Version history with rationale for each change
`taxonomy.md`	Category list, tier system, scan cadence per tier
`prompts/README.md`	Prompt curation process (6 stages, multi-agent validation)
`prompts/example-*.md`	Example prompt frameworks per category (illustrative — exact strings remain closed to prevent Goodhart's Law)
`tools/prompt_curation/`	Code for the multi-agent prompt curation pipeline
`LICENSE`	MIT

What's NOT here (and why)

Some operational details remain closed:

Exact prompt strings — disclosing the exact 100 prompts per category would let vendors optimize their pages specifically against our queries (Goodhart's Law). We publish the distribution by type (40% buying intent, 25% comparison, 20% specific need, 15% informational, 10% brand-direct) and example patterns, not exact strings. 20% of the prompt pool rotates quarterly.
Anti-gaming thresholds — specific burst-detection cutoffs, sock puppet karma thresholds, and review-bombing pattern signatures are closed. We publish the categories (rank-jump flag at >30 ranks, fresh-Wikidata excluded <90 days, etc.) but not exact numbers.
Honeypot brand details — disclosure would defeat the purpose. The honeypot is documented as existing in methodology.json for transparency.
First-party telemetry from Free Checker — aggregated weights from this telemetry feed into model weighting, but raw user data remains closed (GDPR).

These categories of closed information are explicitly listed in methodology.json so the boundary between open and closed is itself transparent.

Versioning policy

No retroactive changes. Methodology updates apply to future cycles only. If we change the model weighting formula in v1.1, scores for cycles published before v1.1 are not retroactively recomputed (lesson from FIDE 2024 backlash, "stealing rating points").
Quarterly major reviews + ad-hoc minor patches. Major reviews happen at the start of each quarter. Minor patches (typos, clarifications, additional examples) anytime — versioned as v1.0.1, v1.0.2, etc.
Every change has a public commit with rationale. No silent edits.

Citation

If you cite Citee Index methodology in academic work, journalism, or business reports:

Citee Index Methodology v1.0.0 (2026-05-03).
LMW Commerce / Citee. https://github.com/lmwcommerce/citee-methodology

Contributing

Issues welcome — open one if you spot:

Methodological flaws or statistical issues
Errors in formulas or definitions
Missing edge cases in anti-gaming
Documentation typos or unclear sections

Pull requests considered for documentation, code in tools/, and example frameworks. Methodology changes themselves are decided internally based on quarterly review + community feedback. Every accepted methodology change is credited in CHANGELOG.md.

License

MIT. See LICENSE.

You're free to use this methodology, fork it, build on it, replicate it, criticize it. We only ask: if you publish a competing ranking, don't claim it's reproduced from Citee data without running the formulas yourself. Methodology is open; our raw query log is the source of truth.

Maintained by: LMW Commerce · Jacek Kubas Contact: hello@citee.ai