Jacek Kubas f76cf2858b v1.0.0 — initial Citee Index Methodology release

Foundational public methodology for the first open public ranking of brand
visibility in AI search results (ChatGPT, Perplexity, Gemini, Claude).

This release establishes the framework — no rankings have been computed
or published yet. First scan cycle: late May 2026 (private validation).
First public ranking publication target: August 2026, after 3 validation
cycles.

Includes:
- methodology.json: machine-readable formulas, weights, policies
- README.md: human-readable overview + open/closed boundary
- CHANGELOG.md: versioning policy + v1.0.0 release notes
- taxonomy.md: tier system + 11 PL pilot categories
- LICENSE: MIT
- .gitignore: closed operational data (exact prompts, anti-gaming thresholds)
- prompts/README.md: 6-stage prompt curation process
- prompts/example-swiece-sojowe-pl.md: illustrative framework for first category

Strategic principles:
- Algorithm-first, no advisory board
- Open methodology + closed exact prompts (Goodhart's Law defense)
- No retroactive changes (FIDE 2024 lesson)
- No pay-to-play, hard rule (Moody's / Forbes 30 Under 30 lessons)
- Subjective opinion disclaimer (Gartner v. NetScout 2020 First Amendment shield)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-03 17:25:56 +02:00

5.2 KiB

Raw Permalink Blame History

Citee Index Methodology

Open methodology for the first public ranking of brand visibility in AI search results.

citee.ai · Methodology page · Forgejo · GitHub mirror

What this is

Citee Index measures how brands appear in AI-generated answers across major LLM-powered search systems (ChatGPT with web search, Perplexity, Gemini, Claude). The ranking is published quarterly per category and country.

This repository contains the complete public methodology — formulas, model weights, prompt-type distribution, cross-signal definitions, and the prompt curation process. Every change is committed publicly with rationale.

This is NOT:

A SaaS dashboard (that's Citee Pro, separate product)
A list of paid placements (zero pay-to-play, hard rule in methodology.json)
A static document — methodology evolves through versioned releases (see CHANGELOG.md)

Why open

Three reasons:

Reproducibility. Anyone can audit our scoring against the public raw query log.
Cryptographic timestamping. Git history is immutable — we cannot retroactively edit the methodology to hide a bug.
Subjective opinion shield. Open formula + public versioning establishes that scores are "expressions of opinion based on observed AI model outputs," not factual claims (legal precedent: Gartner v. NetScout, Connecticut Supreme Court 2020).

What's in this repo

File	Purpose
`methodology.json`	Machine-readable methodology — formulas, weights, thresholds, policies
`CHANGELOG.md`	Version history with rationale for each change
`taxonomy.md`	Category list, tier system, scan cadence per tier
`prompts/README.md`	Prompt curation process (6 stages, multi-agent validation)
`prompts/example-*.md`	Example prompt frameworks per category (illustrative — exact strings remain closed to prevent Goodhart's Law)
`tools/prompt_curation/`	Code for the multi-agent prompt curation pipeline
`LICENSE`	MIT

What's NOT here (and why)

Some operational details remain closed:

Exact prompt strings — disclosing the exact 100 prompts per category would let vendors optimize their pages specifically against our queries (Goodhart's Law). We publish the distribution by type (40% buying intent, 25% comparison, 20% specific need, 15% informational, 10% brand-direct) and example patterns, not exact strings. 20% of the prompt pool rotates quarterly.
Anti-gaming thresholds — specific burst-detection cutoffs, sock puppet karma thresholds, and review-bombing pattern signatures are closed. We publish the categories (rank-jump flag at >30 ranks, fresh-Wikidata excluded <90 days, etc.) but not exact numbers.
Honeypot brand details — disclosure would defeat the purpose. The honeypot is documented as existing in methodology.json for transparency.
First-party telemetry from Free Checker — aggregated weights from this telemetry feed into model weighting, but raw user data remains closed (GDPR).

These categories of closed information are explicitly listed in methodology.json so the boundary between open and closed is itself transparent.

Versioning policy

No retroactive changes. Methodology updates apply to future cycles only. If we change the model weighting formula in v1.1, scores for cycles published before v1.1 are not retroactively recomputed (lesson from FIDE 2024 backlash, "stealing rating points").
Quarterly major reviews + ad-hoc minor patches. Major reviews happen at the start of each quarter. Minor patches (typos, clarifications, additional examples) anytime — versioned as v1.0.1, v1.0.2, etc.
Every change has a public commit with rationale. No silent edits.

Citation

If you cite Citee Index methodology in academic work, journalism, or business reports:

Citee Index Methodology v1.0.0 (2026-05-03).
LMW Commerce / Citee. https://github.com/lmwcommerce/citee-methodology

Contributing

Issues welcome — open one if you spot:

Methodological flaws or statistical issues
Errors in formulas or definitions
Missing edge cases in anti-gaming
Documentation typos or unclear sections

Pull requests considered for documentation, code in tools/, and example frameworks. Methodology changes themselves are decided internally based on quarterly review + community feedback. Every accepted methodology change is credited in CHANGELOG.md.

License

MIT. See LICENSE.

You're free to use this methodology, fork it, build on it, replicate it, criticize it. We only ask: if you publish a competing ranking, don't claim it's reproduced from Citee data without running the formulas yourself. Methodology is open; our raw query log is the source of truth.

Maintained by: LMW Commerce · Jacek Kubas Contact: hello@citee.ai

5.2 KiB Raw Permalink Blame History