citee-methodology/README.md

# Citee Index Methodology

> Open methodology for the first public ranking of brand visibility in AI search results.

[**citee.ai**](https://citee.ai) · [**Methodology page**](https://citee.ai/methodology) · [Forgejo](https://git.lmwcommerce.com/citee/citee-methodology) · [GitHub mirror](https://github.com/lmwcommerce/citee-methodology)

---

## What this is

Citee Index measures how brands appear in AI-generated answers across major LLM-powered search systems (ChatGPT with web search, Perplexity, Gemini, Claude). The ranking is published quarterly per category and country.

This repository contains the **complete public methodology** — formulas, model weights, prompt-type distribution, cross-signal definitions, and the prompt curation process. Every change is committed publicly with rationale.

**This is NOT:**
- A SaaS dashboard (that's [Citee Pro](https://citee.ai/pro), separate product)
- A list of paid placements (zero pay-to-play, hard rule in [`methodology.json`](./methodology.json))
- A static document — methodology evolves through versioned releases (see [`CHANGELOG.md`](./CHANGELOG.md))

## Why open

Three reasons:

1. **Reproducibility.** Anyone can audit our scoring against the public raw query log.
2. **Cryptographic timestamping.** Git history is immutable — we cannot retroactively edit the methodology to hide a bug.
3. **Subjective opinion shield.** Open formula + public versioning establishes that scores are "expressions of opinion based on observed AI model outputs," not factual claims (legal precedent: *Gartner v. NetScout*, Connecticut Supreme Court 2020).

## What's in this repo

| File | Purpose |
|---|---|
| [`methodology.json`](./methodology.json) | Machine-readable methodology — formulas, weights, thresholds, policies |
| [`CHANGELOG.md`](./CHANGELOG.md) | Version history with rationale for each change |
| [`taxonomy.md`](./taxonomy.md) | Category list, tier system, scan cadence per tier |
| [`prompts/README.md`](./prompts/README.md) | Prompt curation process (6 stages, multi-agent validation) |
| [`prompts/example-*.md`](./prompts/) | Example prompt frameworks per category (illustrative — exact strings remain closed to prevent Goodhart's Law) |
| [`tools/prompt_curation/`](./tools/prompt_curation/) | Code for the multi-agent prompt curation pipeline |
| [`LICENSE`](./LICENSE) | MIT |

## What's NOT here (and why)

Some operational details remain closed:

- **Exact prompt strings** — disclosing the exact 100 prompts per category would let vendors optimize their pages specifically against our queries (Goodhart's Law). We publish the **distribution by type** (40% buying intent, 25% comparison, 20% specific need, 15% informational, 10% brand-direct) and **example patterns**, not exact strings. 20% of the prompt pool rotates quarterly.
- **Anti-gaming thresholds** — specific burst-detection cutoffs, sock puppet karma thresholds, and review-bombing pattern signatures are closed. We publish the categories (rank-jump flag at >30 ranks, fresh-Wikidata excluded <90 days, etc.) but not exact numbers.
- **Honeypot brand details** — disclosure would defeat the purpose. The honeypot is documented as existing in [`methodology.json`](./methodology.json) for transparency.
- **First-party telemetry from Free Checker** — aggregated weights from this telemetry feed into model weighting, but raw user data remains closed (GDPR).

These categories of closed information are explicitly listed in [`methodology.json`](./methodology.json) so the boundary between open and closed is itself transparent.

## Versioning policy

- **No retroactive changes.** Methodology updates apply to **future cycles only**. If we change the model weighting formula in v1.1, scores for cycles published before v1.1 are not retroactively recomputed (lesson from FIDE 2024 backlash, "stealing rating points").
- **Quarterly major reviews + ad-hoc minor patches.** Major reviews happen at the start of each quarter. Minor patches (typos, clarifications, additional examples) anytime — versioned as v1.0.1, v1.0.2, etc.
- **Every change has a public commit with rationale.** No silent edits.

## Citation

If you cite Citee Index methodology in academic work, journalism, or business reports:

```
Citee Index Methodology v1.0.0 (2026-05-03).
LMW Commerce / Citee. https://github.com/lmwcommerce/citee-methodology
```

## Contributing

Issues welcome — open one if you spot:
- Methodological flaws or statistical issues
- Errors in formulas or definitions
- Missing edge cases in anti-gaming
- Documentation typos or unclear sections

Pull requests considered for documentation, code in `tools/`, and example frameworks. **Methodology changes themselves are decided internally** based on quarterly review + community feedback. Every accepted methodology change is credited in `CHANGELOG.md`.

## License

MIT. See [`LICENSE`](./LICENSE).

You're free to use this methodology, fork it, build on it, replicate it, criticize it. We only ask: if you publish a competing ranking, **don't claim it's reproduced from Citee data without running the formulas yourself.** Methodology is open; our raw query log is the source of truth.

---

**Maintained by:** [LMW Commerce](https://lmwcommerce.com) · Jacek Kubas
**Contact:** hello@citee.ai