commit f76cf2858be4dffa27e3b0ae21a3d37acc8b89d8
Author: Jacek Kubas <jacek@lmwcommerce.com>
Date:   Sun May 3 17:25:56 2026 +0200

    v1.0.0 — initial Citee Index Methodology release
    
    Foundational public methodology for the first open public ranking of brand
    visibility in AI search results (ChatGPT, Perplexity, Gemini, Claude).
    
    This release establishes the framework — no rankings have been computed
    or published yet. First scan cycle: late May 2026 (private validation).
    First public ranking publication target: August 2026, after 3 validation
    cycles.
    
    Includes:
    - methodology.json: machine-readable formulas, weights, policies
    - README.md: human-readable overview + open/closed boundary
    - CHANGELOG.md: versioning policy + v1.0.0 release notes
    - taxonomy.md: tier system + 11 PL pilot categories
    - LICENSE: MIT
    - .gitignore: closed operational data (exact prompts, anti-gaming thresholds)
    - prompts/README.md: 6-stage prompt curation process
    - prompts/example-swiece-sojowe-pl.md: illustrative framework for first category
    
    Strategic principles:
    - Algorithm-first, no advisory board
    - Open methodology + closed exact prompts (Goodhart's Law defense)
    - No retroactive changes (FIDE 2024 lesson)
    - No pay-to-play, hard rule (Moody's / Forbes 30 Under 30 lessons)
    - Subjective opinion disclaimer (Gartner v. NetScout 2020 First Amendment shield)
    
    Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

diff --git a/.gitignore b/.gitignore
new file mode 100644
index 0000000..93ddee8
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1,54 @@
+# OS / editor cruft
+.DS_Store
+Thumbs.db
+*.swp
+*.swo
+*~
+.vscode/
+.idea/
+
+# OneDrive sync conflicts (just in case repo ends up under OneDrive accidentally)
+*-Bob.*
+*conflict*
+
+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+.venv/
+venv/
+env/
+*.egg-info/
+.pytest_cache/
+
+# Closed operational data — exact prompt strings remain CLOSED to prevent
+# Goodhart's Law (when a measure becomes a target, it ceases to be a measure).
+# Public examples and frameworks live in prompts/ at the repo root.
+prompts/swiece-sojowe-pl/
+prompts/kosmetyki-naturalne-pl/
+prompts/suplementy-nutricosmetyki-pl/
+prompts/diety-pudelkowe-pl/
+prompts/premium-pet-food-pl/
+prompts/kawa-specialty-pl/
+prompts/czekolada-rzemieslnicza-pl/
+prompts/kursy-programowania-bootcampy-pl/
+prompts/kliniki-estetyczne-dermo-pl/
+prompts/fitness-studios-premium-pl/
+prompts/kosmetyki-meskie-pl/
+
+# Closed anti-gaming thresholds (private values, public categories documented)
+anti_gaming/private_thresholds.json
+anti_gaming/honeypot_brand.json
+
+# First-party telemetry from Free Checker (GDPR — raw user data closed)
+telemetry/raw/
+
+# Output of scan cycles (raw query logs are public via API but not in repo)
+output/
+scans/
+
+# Secrets
+.env
+.env.*
+*.key
+secrets.json
diff --git a/CHANGELOG.md b/CHANGELOG.md
new file mode 100644
index 0000000..021106b
--- /dev/null
+++ b/CHANGELOG.md
@@ -0,0 +1,61 @@
+# Changelog
+
+All notable changes to Citee Index Methodology are documented in this file.
+
+The format is based on [Keep a Changelog](https://keepachangelog.com/), versioning follows [Semantic Versioning](https://semver.org/) adapted for methodology:
+
+- **MAJOR** (`2.0.0`) — fundamental scoring formula change, weight rebalance, definition of categories
+- **MINOR** (`1.1.0`) — new prompt types, new cross-signals, new model added, anti-gaming rule additions
+- **PATCH** (`1.0.1`) — documentation fixes, clarifications, additional examples, typos
+
+**Important:** No retroactive changes. Methodology updates apply to FUTURE cycles only. Cycles published before a version bump are not recomputed.
+
+---
+
+## [1.0.0] — 2026-05-03
+
+Initial public release. Foundational methodology. **No public ranking yet** — first publication scheduled August 2026 after 3-month validation period.
+
+### Added
+
+- **Scoring formula:** `CiteeScore = sum(mention_score_per_model * model_weight) * (1 + cross_signal_bonus)`, normalized to 0-100 per category
+- **Model weighting** for PL market: ChatGPT 0.45, Perplexity 0.25, Gemini 0.20, Claude 0.10 (Claude added Q4 2026 in pilot, see `methodology.json` for rationale)
+- **Mention score per model:** position (0.4) + prominence (0.3) + sentiment (0.15) + citation depth (0.15)
+- **5 prompt types** with weights:
+  - Buying intent (2.0) — 30% of pool
+  - Comparison (1.5) — 25%
+  - Specific need (1.5) — 20%
+  - Informational (0.3) — 15%
+  - Brand-direct (0.3) — 10%
+- **4 cross-signals** with maximum total bonus +20%:
+  - Wikidata entry (≥90 days, ≥5 triples): +5%
+  - Trustpilot/Opineo (>50 reviews, ≥4.0 average, no review bombing): +5%
+  - Reddit organic mentions (>10 in niche subreddit, account age + karma weighted): +5%
+  - Google AI Overviews presence (verified via SerpAPI): +5%
+- **Anti-gaming protections:** rank-jump flag (>30), fresh Wikidata exclusion (<90 days), review bombing exclusion, sock puppet detection (Reddit), prompt injection scrape filters (CSS hidden text, off-screen content, font-size:0)
+- **Honeypot brand** mechanism for detecting AI training data circular logic and unauthorized scraping
+- **Statistical methodology:** 95% confidence intervals via bootstrap resampling, overlapping CIs reported as tied (no false precision), 100 prompts × 3 models × 2 repetitions = 600 queries per category per cycle in pilot
+- **Tier system:**
+  - Tier 1 — large markets (>1000 brands, >100M PLN GMV) — monthly scan
+  - Tier 2 — medium markets (100-1000 brands, 10-100M PLN GMV) — quarterly scan
+  - Tier 3 — niche markets (<100 brands, <10M PLN GMV) — semi-annual scan
+- **11 pilot categories (PL, all Tier 2):** kosmetyki naturalne, suplementy / nutricosmetyki, diety pudełkowe, premium pet food, kawa specialty, czekolada rzemieślnicza, kursy programowania / IT bootcampy, kliniki estetyczne / dermatologia, fitness studios premium, kosmetyki dla mężczyzn, świece sojowe
+- **Publication policy:** 3-month validation period before first public ranking. Hybrid format — Top 10 public HTML (SEO indexed), full ranking 100 brands as PDF behind email gate. `robots.txt` disallow for GPTBot, ClaudeBot, PerplexityBot, CCBot, Google-Extended on full-data endpoints.
+- **Right to reply:** each brand profile page includes "Brand response" section, moderated for factual accuracy, 30-day response window per cycle
+- **Monetization policy:** ranked brands NEVER pay Citee directly (hard rule). Revenue from Citee Pro SaaS (paid by shops optimizing visibility, not ranked brands), Industry Reports (paid by agencies/media), and Sponsored Custom Research (commissioned for category research, not brand-specific)
+- **Prompt curation process** (6 stages): persona generator → prompt brainstormer → reality check (Google Trends, Reddit, Quora) → multi-agent validation (3 critics) → pilot test run → human approval
+
+### Notes
+
+This is **v1.0.0 — methodology release only**. No ranking has been computed or published. Foundational document establishing the framework.
+
+First scan cycle planned: late May 2026 (private validation).
+First public ranking publication target: August 2026 (after 3 validation cycles).
+
+---
+
+## Pre-history
+
+Project began as "AIO Visibility" module within LMW Pulse SaaS in March 2026. Pivoted to standalone product `citee.ai` in May 2026 after market analysis showed no global competitor publishing public AI visibility rankings (27+ tracked SaaS dashboards but zero public rankings).
+
+Strategic shift from advisory-board-driven model (Gartner / Forbes 30 Under 30 pattern) to algorithm-first model (Glassdoor / Trustpilot / FIDE / PageRank pattern) decided 2026-05-03 based on principle: "the tool must defend itself, not by authority."
diff --git a/LICENSE b/LICENSE
new file mode 100644
index 0000000..26236f1
--- /dev/null
+++ b/LICENSE
@@ -0,0 +1,40 @@
+MIT License
+
+Copyright (c) 2026 LMW Commerce / Jacek Kubas
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+
+---
+
+Note on Citee Index data:
+
+While this methodology is MIT-licensed and freely usable, the Citee Index
+itself (the published rankings, raw query logs, and brand-level scores) is
+provided under a separate data license described at
+https://citee.ai/data-license. The methodology being open does not imply
+that derived datasets from Citee scans are public domain.
+
+Disclaimer regarding scoring:
+
+Citee Index scores represent expressions of opinion based on observed AI
+model outputs at specific points in time. They are not factual claims about
+the relative quality, popularity, or merit of any brand. The methodology is
+a framework for converting observed AI outputs into a comparable index;
+reasonable people could construct alternative methodologies that produce
+different rankings.
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..5c837ba
--- /dev/null
+++ b/README.md
@@ -0,0 +1,85 @@
+# Citee Index Methodology
+
+> Open methodology for the first public ranking of brand visibility in AI search results.
+
+[**citee.ai**](https://citee.ai) · [**Methodology page**](https://citee.ai/methodology) · [Forgejo](https://git.lmwcommerce.com/citee/citee-methodology) · [GitHub mirror](https://github.com/lmwcommerce/citee-methodology)
+
+---
+
+## What this is
+
+Citee Index measures how brands appear in AI-generated answers across major LLM-powered search systems (ChatGPT with web search, Perplexity, Gemini, Claude). The ranking is published quarterly per category and country.
+
+This repository contains the **complete public methodology** — formulas, model weights, prompt-type distribution, cross-signal definitions, and the prompt curation process. Every change is committed publicly with rationale.
+
+**This is NOT:**
+- A SaaS dashboard (that's [Citee Pro](https://citee.ai/pro), separate product)
+- A list of paid placements (zero pay-to-play, hard rule in [`methodology.json`](./methodology.json))
+- A static document — methodology evolves through versioned releases (see [`CHANGELOG.md`](./CHANGELOG.md))
+
+## Why open
+
+Three reasons:
+
+1. **Reproducibility.** Anyone can audit our scoring against the public raw query log.
+2. **Cryptographic timestamping.** Git history is immutable — we cannot retroactively edit the methodology to hide a bug.
+3. **Subjective opinion shield.** Open formula + public versioning establishes that scores are "expressions of opinion based on observed AI model outputs," not factual claims (legal precedent: *Gartner v. NetScout*, Connecticut Supreme Court 2020).
+
+## What's in this repo
+
+| File | Purpose |
+|---|---|
+| [`methodology.json`](./methodology.json) | Machine-readable methodology — formulas, weights, thresholds, policies |
+| [`CHANGELOG.md`](./CHANGELOG.md) | Version history with rationale for each change |
+| [`taxonomy.md`](./taxonomy.md) | Category list, tier system, scan cadence per tier |
+| [`prompts/README.md`](./prompts/README.md) | Prompt curation process (6 stages, multi-agent validation) |
+| [`prompts/example-*.md`](./prompts/) | Example prompt frameworks per category (illustrative — exact strings remain closed to prevent Goodhart's Law) |
+| [`tools/prompt_curation/`](./tools/prompt_curation/) | Code for the multi-agent prompt curation pipeline |
+| [`LICENSE`](./LICENSE) | MIT |
+
+## What's NOT here (and why)
+
+Some operational details remain closed:
+
+- **Exact prompt strings** — disclosing the exact 100 prompts per category would let vendors optimize their pages specifically against our queries (Goodhart's Law). We publish the **distribution by type** (40% buying intent, 25% comparison, 20% specific need, 15% informational, 10% brand-direct) and **example patterns**, not exact strings. 20% of the prompt pool rotates quarterly.
+- **Anti-gaming thresholds** — specific burst-detection cutoffs, sock puppet karma thresholds, and review-bombing pattern signatures are closed. We publish the categories (rank-jump flag at >30 ranks, fresh-Wikidata excluded <90 days, etc.) but not exact numbers.
+- **Honeypot brand details** — disclosure would defeat the purpose. The honeypot is documented as existing in [`methodology.json`](./methodology.json) for transparency.
+- **First-party telemetry from Free Checker** — aggregated weights from this telemetry feed into model weighting, but raw user data remains closed (GDPR).
+
+These categories of closed information are explicitly listed in [`methodology.json`](./methodology.json) so the boundary between open and closed is itself transparent.
+
+## Versioning policy
+
+- **No retroactive changes.** Methodology updates apply to **future cycles only**. If we change the model weighting formula in v1.1, scores for cycles published before v1.1 are not retroactively recomputed (lesson from FIDE 2024 backlash, "stealing rating points").
+- **Quarterly major reviews + ad-hoc minor patches.** Major reviews happen at the start of each quarter. Minor patches (typos, clarifications, additional examples) anytime — versioned as v1.0.1, v1.0.2, etc.
+- **Every change has a public commit with rationale.** No silent edits.
+
+## Citation
+
+If you cite Citee Index methodology in academic work, journalism, or business reports:
+
+```
+Citee Index Methodology v1.0.0 (2026-05-03).
+LMW Commerce / Citee. https://github.com/lmwcommerce/citee-methodology
+```
+
+## Contributing
+
+Issues welcome — open one if you spot:
+- Methodological flaws or statistical issues
+- Errors in formulas or definitions
+- Missing edge cases in anti-gaming
+- Documentation typos or unclear sections
+
+Pull requests considered for documentation, code in `tools/`, and example frameworks. **Methodology changes themselves are decided internally** based on quarterly review + community feedback. Every accepted methodology change is credited in `CHANGELOG.md`.
+
+## License
+
+MIT. See [`LICENSE`](./LICENSE).
+
+You're free to use this methodology, fork it, build on it, replicate it, criticize it. We only ask: if you publish a competing ranking, **don't claim it's reproduced from Citee data without running the formulas yourself.** Methodology is open; our raw query log is the source of truth.
+
+---
+
+**Maintained by:** [LMW Commerce](https://lmwcommerce.com) · Jacek Kubas
+**Contact:** hello@citee.ai
diff --git a/methodology.json b/methodology.json
new file mode 100644
index 0000000..10dfe2f
--- /dev/null
+++ b/methodology.json
@@ -0,0 +1,270 @@
+{
+  "version": "1.0.0",
+  "released": "2026-05-03",
+  "name": "Citee Index Methodology",
+  "description": "Public methodology for Citee Index — the first open public ranking of brand visibility in AI search results (ChatGPT, Perplexity, Gemini, Claude).",
+  "license": "MIT",
+  "repository": "https://git.lmwcommerce.com/citee/citee-methodology",
+  "mirror": "https://github.com/lmwcommerce/citee-methodology",
+  "homepage": "https://citee.ai/methodology",
+
+  "philosophy": {
+    "approach": "algorithm-first",
+    "principles": [
+      "Open methodology, public versioning (every change committed publicly)",
+      "Reproducibility — anyone can replicate scores from raw query log",
+      "No pay-to-play — ranked brands never pay Citee directly. Hard rule in ToS.",
+      "Subjective opinion disclaimer — scores are expressions of opinion based on observed AI model outputs (First Amendment shield, Gartner v. NetScout 2020)",
+      "No retroactive changes — methodology updates apply to FUTURE cycles only (FIDE 2024 backlash lesson)",
+      "Confidence intervals — overlapping CIs reported as 'tied', no false precision",
+      "Annual transparency report — manipulation patterns detected, anti-gaming actions taken"
+    ]
+  },
+
+  "scoring": {
+    "formula": "CiteeScore(brand, category, country, month) = sum(mention_score_per_model * model_weight) * (1 + cross_signal_bonus)",
+    "normalization": "Raw score 0-120 normalized to 0-100 per category (top brand = 100, others proportional)",
+    "ranking": "Sort by CiteeScore descending. Brands with overlapping confidence intervals reported as tied."
+  },
+
+  "models": {
+    "weighting_basis": "Each model weighted by its share of AI search traffic per region. Weights revised quarterly using 3 public data sources (OpenRouter rankings, Similarweb free tier, Statcounter/IAB Polska/Mobirank reports) plus first-party Free Checker telemetry.",
+    "weights": {
+      "PL": {
+        "chatgpt": {
+          "weight": 0.45,
+          "model_version": "gpt-4o-search-2026-04",
+          "rationale": "Largest user share PL based on OpenRouter + Similarweb data"
+        },
+        "perplexity": {
+          "weight": 0.25,
+          "model_version": "sonar-pro-2026-03",
+          "rationale": "Growing power user segment, search-native architecture"
+        },
+        "gemini": {
+          "weight": 0.20,
+          "model_version": "gemini-2.0-pro",
+          "rationale": "Google embed + AI Overviews coverage"
+        },
+        "claude": {
+          "weight": 0.10,
+          "model_version": "claude-sonnet-2026-q1",
+          "rationale": "Niche but growing, added Q4 2026 in pilot",
+          "status": "added_q4_2026"
+        }
+      }
+    },
+    "pilot_models": ["chatgpt", "perplexity", "gemini"],
+    "claude_addition_planned": "2026-Q4"
+  },
+
+  "mention_score_per_model": {
+    "formula": "mention_score = (position * 0.4) + (prominence * 0.3) + (sentiment * 0.15) + (citation_depth * 0.15)",
+    "range": "0.0 - 1.0",
+    "components": {
+      "position": {
+        "weight": 0.4,
+        "scale": {
+          "rank_1": 1.0,
+          "rank_2": 0.7,
+          "rank_3": 0.5,
+          "rank_4_to_10": 0.3,
+          "not_mentioned": 0.0
+        }
+      },
+      "prominence": {
+        "weight": 0.3,
+        "scale": {
+          "passing_mention": 0.3,
+          "listed_with_description": 0.6,
+          "actively_recommended": 1.0
+        }
+      },
+      "sentiment": {
+        "weight": 0.15,
+        "scale": {
+          "positive": 0.2,
+          "neutral": 0.0,
+          "negative_or_caveated": -0.3
+        }
+      },
+      "citation_depth": {
+        "weight": 0.15,
+        "scale": {
+          "direct_link_to_brand_site": 1.0,
+          "mention_only_no_link": 0.5
+        }
+      }
+    }
+  },
+
+  "prompt_types": {
+    "rationale": "Different prompt types reflect different stages of buyer funnel. Buying intent prompts weighted higher because they correlate with revenue impact.",
+    "weights": {
+      "buying": {
+        "weight": 2.0,
+        "examples_pattern": "Where to buy [category] premium / Best place to buy [category]",
+        "share_of_pool": "30%"
+      },
+      "comparison": {
+        "weight": 1.5,
+        "examples_pattern": "Best [category] / Top [category] handmade / [Brand A] vs [Brand B]",
+        "share_of_pool": "25%"
+      },
+      "specific_need": {
+        "weight": 1.5,
+        "examples_pattern": "[Category] with [specific attribute] / [Category] for [specific use case]",
+        "share_of_pool": "20%"
+      },
+      "informational": {
+        "weight": 0.3,
+        "examples_pattern": "What is [category] / How does [category] work",
+        "share_of_pool": "15%"
+      },
+      "brand_direct": {
+        "weight": 0.3,
+        "examples_pattern": "[Brand X] reviews / Opinions about [Brand X]",
+        "share_of_pool": "10%"
+      }
+    },
+    "pool_size_per_category": 100,
+    "pool_rotation": "20% of prompts rotate quarterly. Distribution by type published. Exact strings remain CLOSED to prevent Goodhart's Law (when a measure becomes a target, it ceases to be a measure)."
+  },
+
+  "cross_signals": {
+    "rationale": "Cross-signals provide reality check — does the brand exist outside AI training data? Brand with high AI score but zero cross-signals may indicate content spam farm rather than real entity.",
+    "max_total_bonus": 0.20,
+    "signals": {
+      "wikidata_entry": {
+        "bonus": 0.05,
+        "criteria": "Brand has Wikidata entry, minimum 5 triples (instance_of, country, founder OR founded_date, official_website, ISNI), entry age >= 90 days",
+        "anti_gaming": "Entries < 90 days old excluded to prevent rapid-deployment manipulation"
+      },
+      "trustpilot_or_opineo": {
+        "bonus": 0.05,
+        "criteria": "Reviews count > 50, average rating > 4.0, no review bombing detected (review burst > 50 in 30 days = excluded)"
+      },
+      "reddit_organic_mentions": {
+        "bonus": 0.05,
+        "criteria": "Organic mentions in niche subreddit > 10, account_age + karma weighted, sock puppet detection applied (new accounts < 30 days excluded)"
+      },
+      "google_ai_overviews_presence": {
+        "bonus": 0.05,
+        "criteria": "Brand cited in Google AI Overviews response for at least one tracked prompt in category, verified via SerpAPI"
+      }
+    }
+  },
+
+  "anti_gaming": {
+    "public_thresholds": {
+      "rank_jump_flag": "Brand jumping > 30 ranks in single cycle triggers anomaly review and one-cycle score freeze",
+      "fresh_wikidata_excluded": "< 90 days",
+      "review_bombing_excluded": "> 50 reviews in 30 days from new accounts",
+      "sock_puppet_excluded": "Reddit accounts < 30 days old or karma < threshold"
+    },
+    "private_thresholds": {
+      "rationale": "Specific burst detection thresholds, sock puppet karma cutoffs, and pattern matching rules remain CLOSED to prevent gaming. Available to legal/regulatory authorities upon request.",
+      "categories": [
+        "burst_detection_thresholds",
+        "sock_puppet_karma_cutoffs",
+        "review_bombing_pattern_signatures",
+        "prompt_injection_detection_signatures"
+      ]
+    },
+    "honeypot_brand": {
+      "active": true,
+      "rationale": "Fictional brand inserted at predetermined ranking position to detect AI training data circular logic and unauthorized scraping. If model cites honeypot brand, evidence of training on Citee data without attribution.",
+      "details": "CLOSED — disclosure would defeat purpose"
+    },
+    "prompt_injection_defense": {
+      "scrape_filters": [
+        "Strip CSS hidden text (display:none, visibility:hidden, color:white-on-white)",
+        "Strip off-screen positioned content (left:-9999px, etc.)",
+        "Strip font-size:0 and opacity:0 elements",
+        "Detect and exclude content in noscript that contradicts visible content"
+      ],
+      "consequence": "Brands using prompt injection excluded from current cycle + publicly named in annual transparency report"
+    }
+  },
+
+  "statistical_methodology": {
+    "queries_per_cycle": {
+      "prompts_per_category": 100,
+      "models": "3 in pilot (ChatGPT, Perplexity, Gemini), 4 from Q4 2026 (+ Claude)",
+      "repetitions_per_prompt": 2,
+      "total_per_category_per_cycle": "100 * 3 * 2 = 600 (pilot), 100 * 4 * 2 = 800 (post Q4 2026)"
+    },
+    "confidence_intervals": "95% CI computed via bootstrap resampling. Brands with overlapping CIs reported as tied — no false precision.",
+    "minimum_brands_per_category": 20,
+    "tied_score_handling": "If CI(A) overlaps CI(B), both reported at same rank with '=' indicator"
+  },
+
+  "scan_cadence": {
+    "tier_1_large_markets": {
+      "frequency": "monthly",
+      "criteria": ">1000 brands visible, >100M PLN GMV"
+    },
+    "tier_2_medium_markets": {
+      "frequency": "quarterly",
+      "criteria": "100-1000 brands, 10-100M PLN GMV"
+    },
+    "tier_3_niche_markets": {
+      "frequency": "semi-annually",
+      "criteria": "<100 brands, <10M PLN GMV"
+    },
+    "current_pilot_tier": "all categories in pilot are Tier 2 (quarterly)"
+  },
+
+  "publication_policy": {
+    "validation_period_before_first_publication": "3 months / 3 cycles minimum",
+    "first_public_ranking": "August 2026 (target)",
+    "format": "Hybrid — Top 10 public HTML (SEO indexed), full ranking 100 brands as PDF behind email gate",
+    "ai_crawler_policy": {
+      "robots_txt_disallow": ["GPTBot", "ClaudeBot", "PerplexityBot", "CCBot", "Google-Extended"],
+      "endpoints_protected": ["/api/ranking-full", "/index/*/full.pdf"],
+      "rationale": "Prevents AI training data circular logic. Hybrid approach (top 10 public, ogon protected) balances SEO with measurement integrity."
+    },
+    "right_to_reply": "Each brand profile page includes 'Brand response' section. Brands can submit response (moderated for factual accuracy) within 30 days of cycle publication."
+  },
+
+  "monetization_policy": {
+    "ranked_brands_pay_zero": true,
+    "rationale": "Issuer-pays model fundamentally compromises ranking credibility (Moody's $864M settlement, Forbes 30 Under 30 fraud roundup). Citee Index revenue comes from indirect channels only.",
+    "approved_revenue_sources": [
+      "Citee Pro SaaS (199-449 PLN/mo) — paid by shops optimizing their visibility, NOT by ranked brands",
+      "Industry Reports (999-2999 PLN/quarter) — paid by agencies, media, market research firms",
+      "Sponsored Custom Research (9990-29990 PLN) — commissioned by media/agency for category research, NOT brand-specific"
+    ],
+    "prohibited": [
+      "Brand profile upgrades (paid premium listing)",
+      "Verified badges (annual fee for ranking participation)",
+      "Awards sponsored by ranked brands",
+      "Any direct payment from ranked entity to Citee"
+    ]
+  },
+
+  "categories_pilot_2026": {
+    "country": "PL",
+    "tier": "Tier 2 (quarterly scan)",
+    "list": [
+      "kosmetyki-naturalne",
+      "suplementy-nutricosmetyki",
+      "diety-pudelkowe",
+      "premium-pet-food",
+      "kawa-specialty",
+      "czekolada-rzemieslnicza",
+      "kursy-programowania-bootcampy",
+      "kliniki-estetyczne-dermo",
+      "fitness-studios-premium",
+      "kosmetyki-meskie",
+      "swiece-sojowe"
+    ],
+    "expansion_plan": {
+      "Q3_2026": "Add Tier 1 PL categories (kosmetyki ogólne, odzież dziecięca, dom & ogród, elektronika audio, biuro)",
+      "Q4_2026": "DACH expansion — pilot 5 categories DE",
+      "2027_Q1": "CEE expansion (CZ, SK, HU, RO)"
+    }
+  },
+
+  "changelog_reference": "See CHANGELOG.md for version history. Methodology evolves through public commits with rationale. NO retroactive changes — modifications apply to FUTURE cycles only."
+}
diff --git a/prompts/README.md b/prompts/README.md
new file mode 100644
index 0000000..80399c4
--- /dev/null
+++ b/prompts/README.md
@@ -0,0 +1,172 @@
+# Prompt Curation Process
+
+> How Citee Index builds and validates the prompt pool per category. The 6-stage process that prevents the "garbage in, garbage out" failure mode.
+
+---
+
+## Why this matters
+
+If the prompt pool is junk ("dyfuzory do włosów ranking", "wąski do samochodu"), the ranking is junk. Prompt quality is the single most important upstream input to ranking integrity.
+
+This process exists to ensure every prompt in the active pool meets two tests:
+
+1. **Real buyer test** — would an actual buyer of this category type this query into ChatGPT/Perplexity?
+2. **Reality check** — does this query appear in actual search/discussion data (Google Trends, Reddit, Quora)?
+
+Prompts failing either test are excluded.
+
+## The 6 stages
+
+```
+Stage 1: Persona Generator       (AI)
+   ↓ 5–10 buyer personas per category
+Stage 2: Prompt Brainstormer     (AI per persona)
+   ↓ 200–300 raw prompts
+Stage 3: Reality Check            (Google Trends / Reddit / Quora / AnswerThePublic)
+   ↓ ~150 prompts with verified search demand
+Stage 4: Multi-agent Validation  (3 critic agents in parallel)
+   ↓ ~120 prompts after critique
+Stage 5: Pilot Test Run           (10-prompt sample × 3 models)
+   ↓ ~110 prompts that produce stable, sensible AI outputs
+Stage 6: Human Approval           (founder + category expert)
+   ↓ FINAL POOL: 100 prompts
+```
+
+### Stage 1 — Persona Generator
+
+Claude generates 5–10 buyer personas per category. Each persona has:
+- Demographics (age, location, income bracket)
+- Pain points (what they're trying to solve)
+- Decision factors (price, ingredients, brand, reviews, certifications)
+- Vocabulary (how they actually talk — formal vs colloquial, technical vs lay)
+
+Example for Świece sojowe PL:
+- "30+ kobieta kupująca prezent dla mamy"
+- "Self-care millennial 25–35 po pracy"
+- "Wnętrzarz minimalistyczne mieszkanie"
+- "Mężczyzna kupujący prezent walentynkowy"
+- "Mama małych dzieci szukająca bezpiecznego zapachu"
+
+### Stage 2 — Prompt Brainstormer
+
+For each persona, Claude generates 30–50 prompts in the voice of that persona — "how would I phrase this question to ChatGPT?" Total per category: ~200–300 raw prompts.
+
+Distribution target by type (enforced at this stage):
+- Buying intent (weight 2.0): 30%
+- Comparison (weight 1.5): 25%
+- Specific need (weight 1.5): 20%
+- Informational (weight 0.3): 15%
+- Brand-direct (weight 0.3): 10%
+
+### Stage 3 — Reality Check
+
+Each prompt cross-referenced against real-world data:
+
+| Source | Method | Threshold |
+|---|---|---|
+| **Google Trends API** | PL queries past 12 months | minimum search volume present |
+| **Google Search Console** (where available) | Real search queries to brand sites we have access to | inspirational source for vocabulary |
+| **Reddit search** | r/Polska_Marka, niche subreddits | actual user phrasing |
+| **Quora PL** | Questions asked in category | real curiosity patterns |
+| **AnswerThePublic** | Public scraping of "people also ask" | discovery of long-tail patterns |
+| **People Also Ask (Google)** | For top category queries | semantic neighbors |
+
+Prompts with zero/marginal real-world signal are removed. ~300 → ~150.
+
+### Stage 4 — Multi-agent Validation
+
+Three AI critic agents review the list in parallel:
+
+**Agent A — "Real buyer critique"**
+Persona-grounded review. Each persona "reads" the prompts and flags ones that don't sound natural for that persona. Prompts marked unnatural by 2+ personas are removed.
+
+**Agent B — "Methodology critic"**
+Statistical and structural review. Checks:
+- Prompt type distribution stays within ±5% of target
+- No subcategory over/under-represented
+- Vocabulary diversity (we're not repeating the same phrasing)
+- Length distribution reasonable (no 50-word prompts, no 2-word prompts)
+
+**Agent C — "Vendor exploit hunter"**
+Anti-gaming review. Identifies prompts that are too easy to game by content marketing fluff:
+- Generic informational queries that any vendor can write a blog post for
+- Prompts where AI answer is dominated by Wikipedia (vendor can edit Wikipedia)
+- Prompts where answer comes from one Reddit post (vendor can write that post)
+
+Each agent produces a list of flagged prompts. Anything flagged by 2+ agents is removed. ~150 → ~120.
+
+### Stage 5 — Pilot Test Run
+
+The ~120 candidate prompts get a sample test:
+- Pick 10 prompts (stratified across types)
+- Run on ChatGPT-search, Perplexity Sonar, Gemini Pro
+- Each prompt × 3 models = 30 outputs
+
+**Reject criteria:**
+- AI returns "I don't know" or "this depends on your preferences" (no actionable brand mentions)
+- Outputs across 3 models have zero overlap (prompt produces incoherent/random results)
+- AI returns a list of countries/categories instead of brands (prompt was misinterpreted)
+
+Prompts failing pilot are flagged for revision or removal. ~120 → ~110.
+
+### Stage 6 — Human Approval
+
+The founder + category expert review the final ~110 candidates and select the production 100.
+
+**Founder always reviews.** For categories outside founder's domain knowledge, a paid expert reviewer (1–2 hours, $50–100) is engaged:
+
+| Category | Expert profile |
+|---|---|
+| Kosmetyki naturalne | Beauty product manager / freelance marketer |
+| Suplementy / nutricosmetyki | Nutritionist / DTC supplement marketer |
+| Diety pudełkowe | Fitness coach / dietitian |
+| Premium pet food | Pet specialty store owner / dog trainer |
+| Kawa specialty | Coffee blogger / barista trainer |
+| Czekolada rzemieślnicza | Food blogger / chocolate-focused content creator |
+| Kursy programowania | Bootcamp graduate / hiring manager |
+| Kliniki estetyczne | Dermatologist or aesthetic medicine consultant |
+| Fitness studios | Personal trainer / gym manager |
+| Kosmetyki męskie | Men's grooming influencer / DTC marketer |
+| Świece sojowe | Founder + JAKULO customer service data |
+
+The final 100 prompts are committed to the closed `prompts/{slug}/` directory (gitignored). A public example framework is committed to `prompts/example-{slug}.md` (this repo) showing the structure and 5–10 illustrative examples per type — but **not the exact production strings**.
+
+## Quarterly refresh — 20% rotation
+
+Every quarter, the curation pipeline runs in refresh mode:
+
+1. **Trend check** — Google Trends API: which prompts have lost relative search volume?
+2. **New patterns** — Reddit/Quora scrape: what new question patterns have emerged?
+3. **New entrants** — scan model outputs from past quarter: what brands appeared in answers but aren't in our brand catalog?
+4. **Generate replacements** — Stages 1–5 for the rotation set
+5. **Human approval** — founder reviews the proposed 20 swaps in 5–10 minutes
+
+This prevents Goodhart's Law: as the prompt pool becomes known to vendors (through reverse-engineering or leaks), 20% rotation per quarter ensures vendors can't permanently optimize against our exact queries.
+
+## Cost per category
+
+| Stage | API cost | Human cost |
+|---|---|---|
+| 1 — Persona Generator | ~$0.50 (Claude) | — |
+| 2 — Prompt Brainstormer | ~$1.50 (Claude) | — |
+| 3 — Reality Check | $0 (free APIs) | — |
+| 4 — Multi-agent Validation | ~$3 (Claude × 3 critics) | — |
+| 5 — Pilot Test Run | ~$5 (3 models × 30 outputs) | — |
+| 6 — Human Approval | — | ~30 min founder + 1–2h expert ($50–100 for non-founder categories) |
+| **Total per category** | **~$10** | **~30 min + $50–100 for expert categories** |
+
+For 11 pilot categories: ~$110 API + ~5 hours founder time + ~$500 expert reviewers.
+
+## Quarterly refresh cost
+
+Per category per quarter: ~$3 API + 5 minutes founder review.
+
+For 11 categories: ~$35 API + 1 hour founder time per quarter.
+
+## Why this is published openly
+
+We publish the **process** because the integrity of the ranking depends on the integrity of the prompts, and external review of the process is the strongest defense against "your prompts are garbage" attack.
+
+We do NOT publish the **exact strings** because Goodhart's Law: known prompts get optimized against, ceasing to measure organic AI search behavior.
+
+The boundary between "open process" and "closed strings" is itself documented openly.
diff --git a/prompts/example-swiece-sojowe-pl.md b/prompts/example-swiece-sojowe-pl.md
new file mode 100644
index 0000000..65dd28c
--- /dev/null
+++ b/prompts/example-swiece-sojowe-pl.md
@@ -0,0 +1,100 @@
+# Example prompt framework — Świece sojowe PL
+
+> Illustrative framework showing how a category prompt pool is structured. Exact production strings remain CLOSED in `prompts/swiece-sojowe-pl/` (gitignored).
+
+This document is **public** to demonstrate the curation process and prompt-type distribution. It is **not** the actual production prompt list.
+
+---
+
+## Distribution
+
+100 prompts total, distributed by type:
+
+| Type | Count | Weight | Share |
+|---|---|---|---|
+| Buying intent | 30 | 2.0 | 30% |
+| Comparison | 25 | 1.5 | 25% |
+| Specific need | 20 | 1.5 | 20% |
+| Informational | 15 | 0.3 | 15% |
+| Brand-direct | 10 | 0.3 | 10% |
+
+## Personas referenced
+
+- "30+ kobieta kupująca prezent dla mamy"
+- "Self-care millennial 25–35 po pracy"
+- "Wnętrzarz minimalistyczne mieszkanie"
+- "Mężczyzna kupujący prezent walentynkowy"
+- "Mama małych dzieci szukająca bezpiecznego zapachu"
+- "Eko-świadomy konsument 30+"
+- "Hostess kupująca świece dla agroturystyki"
+
+## Buying intent (30 prompts × 2.0 weight) — illustrative examples
+
+These prompts signal active purchase intent. Highest weight because they correlate directly with revenue impact for ranked brands.
+
+- "Gdzie kupić premium ręcznie robioną świecę sojową na prezent dla mamy"
+- "Polska marka świec sojowych z certyfikatem ekologicznym do 200 zł"
+- "Świeca sojowa w eleganckim opakowaniu jako prezent firmowy"
+- "Gdzie zamówić zestaw prezentowy z polskich świec sojowych handmade"
+- *(...26 more, exact strings closed)*
+
+## Comparison (25 prompts × 1.5 weight) — illustrative examples
+
+Decision-stage queries. User is comparing brands or making a choice.
+
+- "JAKULO vs Naturaodpauli — która polska marka świec sojowych lepsza"
+- "Najlepsze polskie świece sojowe handmade 2026 ranking"
+- "Polskie świece sojowe premium — porównanie najpopularniejszych marek"
+- *(...22 more, exact strings closed)*
+
+## Specific need (20 prompts × 1.5 weight) — illustrative examples
+
+Specific use cases or attributes — buyer knows what they want.
+
+- "Świeca sojowa o zapachu wanilii i bursztynu w średnim rozmiarze"
+- "Długo paląca naturalna świeca sojowa do sypialni 60 godzin"
+- "Świeca sojowa bezzapachowa dla osoby z alergią na zapachy"
+- *(...17 more, exact strings closed)*
+
+## Informational (15 prompts × 0.3 weight) — illustrative examples
+
+Research-stage queries. Lower weight because easily gamed by content marketing fluff.
+
+- "Czym różni się świeca sojowa od parafinowej"
+- "Jak rozpoznać prawdziwie sojową świecę"
+- "Czy świece sojowe są zdrowe i bezpieczne"
+- *(...12 more, exact strings closed)*
+
+## Brand-direct (10 prompts × 0.3 weight) — illustrative examples
+
+Direct brand queries. Lower weight because brand winning queries about itself = baseline expectation, not value-add.
+
+- "JAKULO opinie 2026 czy warto kupować"
+- "Co sądzą o polskiej marce świec Naturaodpauli"
+- *(...8 more, exact strings closed)*
+
+## Anti-patterns (excluded)
+
+The following types of prompts are explicitly excluded during Stage 4 (Vendor exploit hunter critic):
+
+| Pattern | Reason | Example |
+|---|---|---|
+| Single-word | No buyer intent, ambiguous | "świeczki", "świece" |
+| Hobbystyczny / DIY | Off-topic for retail | "DIY świece sojowe w domu" |
+| B2B retail | Not consumer-facing | "hurtownia świec sojowych Warszawa" |
+| Brand-agnostic generic | Easy content marketing target | "co to świeca sojowa" |
+| Price-only without category context | Too vague | "tania świeca" |
+| Off-topic technicality | Detection of hobby-craft, not retail | "knot bawełniany do świec wymiary" |
+| Polish typos at scale | Not real query patterns | "swieczka sojova" (single typo OK if frequent in real data) |
+
+## Quarterly rotation policy
+
+Each quarter, 20 prompts (20% of pool) are rotated:
+- 10 retired (lowest real-world search signal in past 90 days, OR known to be gamed)
+- 10 added (new patterns from Reddit/Quora/trends, new persona refinements, new product attributes emerging)
+
+Rotation log is committed to `prompts/swiece-sojowe-pl/rotation_log.md` (closed) with rationale per swap.
+
+---
+
+**This framework is illustrative.** The actual 100 production prompts evolve with each quarterly cycle and are not published as exact strings — only the distribution, personas, anti-patterns, and example patterns are public.
diff --git a/taxonomy.md b/taxonomy.md
new file mode 100644
index 0000000..2ef59f2
--- /dev/null
+++ b/taxonomy.md
@@ -0,0 +1,102 @@
+# Citee Index Taxonomy
+
+> Category list, tier system, and scan cadence per tier. Live document — updated as new categories are added or existing ones are reclassified.
+
+**Last updated:** 2026-05-03 (v1.0.0)
+
+---
+
+## Tier system
+
+Categories are classified by **market depth** (number of visible brands) and **GMV** (PLN annual e-commerce volume in category).
+
+| Tier | Criteria | Scan cadence | Brands tracked per cycle |
+|---|---|---|---|
+| **Tier 1 — Large** | >1000 brands visible, >100M PLN GMV | Monthly | 100 |
+| **Tier 2 — Medium** | 100–1000 brands, 10–100M PLN GMV | Quarterly | 50–100 |
+| **Tier 3 — Niche** | <100 brands, <10M PLN GMV | Semi-annual | 20–50 |
+
+Cross-cutting categories (e.g., "Polish DTC brands Top 100", "Polish handmade Top 100") are published **annually** as flagship reports.
+
+## Country coverage
+
+| Country | Status | First publication |
+|---|---|---|
+| **PL** (Poland) | Pilot — 11 categories | August 2026 |
+| **DE** (Germany) | Planned Q4 2026 | — |
+| **AT** (Austria) | Planned with DE | — |
+| **CH** (Switzerland) | Planned with DE | — |
+| **CZ** (Czech Republic) | Planned Q1 2027 | — |
+| **SK** (Slovakia) | Planned Q1 2027 | — |
+| **HU** (Hungary) | Planned Q2 2027 | — |
+| **RO** (Romania) | Planned Q2 2027 | — |
+| **DK / SE / NO / FI** (Nordic) | Planned Q3 2027 | — |
+| FR / ES / IT | Year 2 | — |
+| UK / US (English-speaking) | **Not in roadmap.** Heavy competition (Profound, Otterly, AthenaHQ). Citee focuses on markets where local language and brand knowledge create defensible moat. | — |
+
+---
+
+## PL — pilot categories (Tier 2)
+
+All 11 launch categories scan quarterly. Selection criteria: each ranked brand is a potential Citee Pro customer (DTC consumer brands or service businesses with marketing budgets), zero overlap with B2B SaaS competitors of LMW Commerce.
+
+| # | Category slug | Display name | Sample brands | Notes |
+|---|---|---|---|---|
+| 1 | `kosmetyki-naturalne-pl` | Kosmetyki naturalne | Resibo, Tołpa, Yope, Bielenda, Dr Irena Eris, Vianek, Lirene | Premium DTC, brand-conscious vendors |
+| 2 | `suplementy-nutricosmetyki-pl` | Suplementy / nutricosmetyki | Olimp, Trec, OstroVit, Allnutrition, Health Labs Care, NaturDay, Pharmovit | DTC growth segment, marketing-heavy |
+| 3 | `diety-pudelkowe-pl` | Diety pudełkowe / catering dietetyczny | Maczfit, Nice To Fit You, Fit&Easy, BistroBox, Light Box | Subscription DTC, high LTV |
+| 4 | `premium-pet-food-pl` | Premium pet food / akcesoria | Brit Care, Acana, Animonda, Royal Canin, Josera, Belcando, Pies Pisany | Loyal customers, premium pricing |
+| 5 | `kawa-specialty-pl` | Kawa specialty / gourmet | Coffee Plant, Bonjour Cafe, Etno Cafe, Hard Beans, Coffeedesk, Cafezal | Vocal community, Reddit-rich |
+| 6 | `czekolada-rzemieslnicza-pl` | Czekolada rzemieślnicza / premium | Manufaktura Czekolady, Mount Blanc, Wawel premium, Ujejski, Wedel exclusive | Premium DTC lifestyle |
+| 7 | `kursy-programowania-bootcampy-pl` | Kursy programowania / IT bootcampy | Kodilla, Coders Lab, Boring Owl, Future Collars, SDA, WSB Online, Akademia Górska | High-LTV edutech, growing 2026 |
+| 8 | `kliniki-estetyczne-dermo-pl` | Kliniki estetyczne / dermatologia premium | Klinika La Perla, Medilumi, Klinika Holistic, Estetica, Dermika | Service business, leadgen-driven |
+| 9 | `fitness-studios-premium-pl` | Fitness studios / personal training | Calypso, Pure Jatomi, Fabric Health Club, niche premium studios | Subscription model, vocal community |
+| 10 | `kosmetyki-meskie-pl` | Kosmetyki dla mężczyzn / męska pielęgnacja | Bartholomew, Chytry Lis, Onlomen, Ziaja Yego, Nivea Men premium, Lirene Men | Growing 2026 segment, new DTC entrants |
+| 11 | `swiece-sojowe-pl` | Świece sojowe | JAKULO, Naturaodpauli, Bookiet, Triny, Aromatowo, Yush, Oskiknot, LemonGlas, Paleta Smaków, Bennovate | Pilot test bed + LMW founder pet category |
+
+## Excluded from pilot — explicit rationale
+
+| Category | Why excluded |
+|---|---|
+| Wino / alkohol PL | Polish "Ustawa o wychowaniu w trzeźwości" art. 13¹ restricts alcohol advertising — regulatory risk too high |
+| Rękodzieło / handmade / Etsy crowd | Margins 20-30%, micro-businesses won't pay 449 PLN/mo for visibility tools |
+| B2B SaaS (CRM, marketing automation, e-commerce platforms) | LMW Commerce competes in adjacent space — these vendors won't pay Citee competitor; also they have own visibility tools |
+| Hosting / domeny | Vendors with own marketing teams, low conversion to Pro SaaS |
+| Banki / ubezpieczenia / fintech B2B | Buyers of reports (agencies) but ranked brands won't buy Pro — banks have enterprise marketing tools |
+
+## Q3-Q4 2026 expansion candidates (Tier 1, monthly scan)
+
+To be added after pilot validation:
+
+- Kosmetyki ogólne (mainstream, not just naturalne) — mega rynek
+- Odzież dziecięca DTC
+- Dom & ogród / wyposażenie wnętrz
+- Elektronika audio premium
+- Akcesoria biurowe / papiernicze B2B-light
+
+## Cross-cutting flagship reports (annual)
+
+- "Citee Index — Polski DTC e-commerce Top 100" (year-end)
+- "Citee Index — Polski handmade Top 50" (year-end)
+- "Citee Index — Polski D2C lifestyle ecosystem" (mid-year)
+
+These reports cut across categories to identify the strongest brand presences in AI search overall, regardless of vertical.
+
+---
+
+## Adding a new category — checklist
+
+Before adding a category to active scan, the following must be true:
+
+1. **Market depth** — at least 20 brands with internet presence visible in PL e-commerce
+2. **AI search demand** — Google Trends data confirms users search for category-related queries
+3. **Buyer profile** — ranked brands fit potential Citee Pro customer persona OR clear agency/media buyer for reports
+4. **No regulatory risk** — category is not subject to advertising restrictions (alcohol, gambling, prescription pharma, etc.)
+5. **Prompt curation feasible** — buyer personas identifiable, decision factors articulable, expert reviewer available if outside founder's domain knowledge
+6. **Category integrated with brand catalog** — minimum 30 brands cataloged with normalized names (handling variations: "JAKULO" vs "Jakulo" vs "jakulo.pl")
+
+When all 6 are true, category enters the pilot validation cycle (3 cycles minimum before public publication).
+
+## Versioning
+
+Changes to this taxonomy are tracked in [CHANGELOG.md](./CHANGELOG.md). Adding a new category, reclassifying tier, or removing a category constitutes a MINOR version bump. Adding a new country or fundamentally revising the tier system is a MAJOR version bump.