v1.0.0 — initial Citee Index Methodology release
Foundational public methodology for the first open public ranking of brand visibility in AI search results (ChatGPT, Perplexity, Gemini, Claude). This release establishes the framework — no rankings have been computed or published yet. First scan cycle: late May 2026 (private validation). First public ranking publication target: August 2026, after 3 validation cycles. Includes: - methodology.json: machine-readable formulas, weights, policies - README.md: human-readable overview + open/closed boundary - CHANGELOG.md: versioning policy + v1.0.0 release notes - taxonomy.md: tier system + 11 PL pilot categories - LICENSE: MIT - .gitignore: closed operational data (exact prompts, anti-gaming thresholds) - prompts/README.md: 6-stage prompt curation process - prompts/example-swiece-sojowe-pl.md: illustrative framework for first category Strategic principles: - Algorithm-first, no advisory board - Open methodology + closed exact prompts (Goodhart's Law defense) - No retroactive changes (FIDE 2024 lesson) - No pay-to-play, hard rule (Moody's / Forbes 30 Under 30 lessons) - Subjective opinion disclaimer (Gartner v. NetScout 2020 First Amendment shield) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
commit
f76cf2858b
8 changed files with 884 additions and 0 deletions
54
.gitignore
vendored
Normal file
54
.gitignore
vendored
Normal file
|
|
@ -0,0 +1,54 @@
|
||||||
|
# OS / editor cruft
|
||||||
|
.DS_Store
|
||||||
|
Thumbs.db
|
||||||
|
*.swp
|
||||||
|
*.swo
|
||||||
|
*~
|
||||||
|
.vscode/
|
||||||
|
.idea/
|
||||||
|
|
||||||
|
# OneDrive sync conflicts (just in case repo ends up under OneDrive accidentally)
|
||||||
|
*-Bob.*
|
||||||
|
*conflict*
|
||||||
|
|
||||||
|
# Python
|
||||||
|
__pycache__/
|
||||||
|
*.py[cod]
|
||||||
|
*$py.class
|
||||||
|
.venv/
|
||||||
|
venv/
|
||||||
|
env/
|
||||||
|
*.egg-info/
|
||||||
|
.pytest_cache/
|
||||||
|
|
||||||
|
# Closed operational data — exact prompt strings remain CLOSED to prevent
|
||||||
|
# Goodhart's Law (when a measure becomes a target, it ceases to be a measure).
|
||||||
|
# Public examples and frameworks live in prompts/ at the repo root.
|
||||||
|
prompts/swiece-sojowe-pl/
|
||||||
|
prompts/kosmetyki-naturalne-pl/
|
||||||
|
prompts/suplementy-nutricosmetyki-pl/
|
||||||
|
prompts/diety-pudelkowe-pl/
|
||||||
|
prompts/premium-pet-food-pl/
|
||||||
|
prompts/kawa-specialty-pl/
|
||||||
|
prompts/czekolada-rzemieslnicza-pl/
|
||||||
|
prompts/kursy-programowania-bootcampy-pl/
|
||||||
|
prompts/kliniki-estetyczne-dermo-pl/
|
||||||
|
prompts/fitness-studios-premium-pl/
|
||||||
|
prompts/kosmetyki-meskie-pl/
|
||||||
|
|
||||||
|
# Closed anti-gaming thresholds (private values, public categories documented)
|
||||||
|
anti_gaming/private_thresholds.json
|
||||||
|
anti_gaming/honeypot_brand.json
|
||||||
|
|
||||||
|
# First-party telemetry from Free Checker (GDPR — raw user data closed)
|
||||||
|
telemetry/raw/
|
||||||
|
|
||||||
|
# Output of scan cycles (raw query logs are public via API but not in repo)
|
||||||
|
output/
|
||||||
|
scans/
|
||||||
|
|
||||||
|
# Secrets
|
||||||
|
.env
|
||||||
|
.env.*
|
||||||
|
*.key
|
||||||
|
secrets.json
|
||||||
61
CHANGELOG.md
Normal file
61
CHANGELOG.md
Normal file
|
|
@ -0,0 +1,61 @@
|
||||||
|
# Changelog
|
||||||
|
|
||||||
|
All notable changes to Citee Index Methodology are documented in this file.
|
||||||
|
|
||||||
|
The format is based on [Keep a Changelog](https://keepachangelog.com/), versioning follows [Semantic Versioning](https://semver.org/) adapted for methodology:
|
||||||
|
|
||||||
|
- **MAJOR** (`2.0.0`) — fundamental scoring formula change, weight rebalance, definition of categories
|
||||||
|
- **MINOR** (`1.1.0`) — new prompt types, new cross-signals, new model added, anti-gaming rule additions
|
||||||
|
- **PATCH** (`1.0.1`) — documentation fixes, clarifications, additional examples, typos
|
||||||
|
|
||||||
|
**Important:** No retroactive changes. Methodology updates apply to FUTURE cycles only. Cycles published before a version bump are not recomputed.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## [1.0.0] — 2026-05-03
|
||||||
|
|
||||||
|
Initial public release. Foundational methodology. **No public ranking yet** — first publication scheduled August 2026 after 3-month validation period.
|
||||||
|
|
||||||
|
### Added
|
||||||
|
|
||||||
|
- **Scoring formula:** `CiteeScore = sum(mention_score_per_model * model_weight) * (1 + cross_signal_bonus)`, normalized to 0-100 per category
|
||||||
|
- **Model weighting** for PL market: ChatGPT 0.45, Perplexity 0.25, Gemini 0.20, Claude 0.10 (Claude added Q4 2026 in pilot, see `methodology.json` for rationale)
|
||||||
|
- **Mention score per model:** position (0.4) + prominence (0.3) + sentiment (0.15) + citation depth (0.15)
|
||||||
|
- **5 prompt types** with weights:
|
||||||
|
- Buying intent (2.0) — 30% of pool
|
||||||
|
- Comparison (1.5) — 25%
|
||||||
|
- Specific need (1.5) — 20%
|
||||||
|
- Informational (0.3) — 15%
|
||||||
|
- Brand-direct (0.3) — 10%
|
||||||
|
- **4 cross-signals** with maximum total bonus +20%:
|
||||||
|
- Wikidata entry (≥90 days, ≥5 triples): +5%
|
||||||
|
- Trustpilot/Opineo (>50 reviews, ≥4.0 average, no review bombing): +5%
|
||||||
|
- Reddit organic mentions (>10 in niche subreddit, account age + karma weighted): +5%
|
||||||
|
- Google AI Overviews presence (verified via SerpAPI): +5%
|
||||||
|
- **Anti-gaming protections:** rank-jump flag (>30), fresh Wikidata exclusion (<90 days), review bombing exclusion, sock puppet detection (Reddit), prompt injection scrape filters (CSS hidden text, off-screen content, font-size:0)
|
||||||
|
- **Honeypot brand** mechanism for detecting AI training data circular logic and unauthorized scraping
|
||||||
|
- **Statistical methodology:** 95% confidence intervals via bootstrap resampling, overlapping CIs reported as tied (no false precision), 100 prompts × 3 models × 2 repetitions = 600 queries per category per cycle in pilot
|
||||||
|
- **Tier system:**
|
||||||
|
- Tier 1 — large markets (>1000 brands, >100M PLN GMV) — monthly scan
|
||||||
|
- Tier 2 — medium markets (100-1000 brands, 10-100M PLN GMV) — quarterly scan
|
||||||
|
- Tier 3 — niche markets (<100 brands, <10M PLN GMV) — semi-annual scan
|
||||||
|
- **11 pilot categories (PL, all Tier 2):** kosmetyki naturalne, suplementy / nutricosmetyki, diety pudełkowe, premium pet food, kawa specialty, czekolada rzemieślnicza, kursy programowania / IT bootcampy, kliniki estetyczne / dermatologia, fitness studios premium, kosmetyki dla mężczyzn, świece sojowe
|
||||||
|
- **Publication policy:** 3-month validation period before first public ranking. Hybrid format — Top 10 public HTML (SEO indexed), full ranking 100 brands as PDF behind email gate. `robots.txt` disallow for GPTBot, ClaudeBot, PerplexityBot, CCBot, Google-Extended on full-data endpoints.
|
||||||
|
- **Right to reply:** each brand profile page includes "Brand response" section, moderated for factual accuracy, 30-day response window per cycle
|
||||||
|
- **Monetization policy:** ranked brands NEVER pay Citee directly (hard rule). Revenue from Citee Pro SaaS (paid by shops optimizing visibility, not ranked brands), Industry Reports (paid by agencies/media), and Sponsored Custom Research (commissioned for category research, not brand-specific)
|
||||||
|
- **Prompt curation process** (6 stages): persona generator → prompt brainstormer → reality check (Google Trends, Reddit, Quora) → multi-agent validation (3 critics) → pilot test run → human approval
|
||||||
|
|
||||||
|
### Notes
|
||||||
|
|
||||||
|
This is **v1.0.0 — methodology release only**. No ranking has been computed or published. Foundational document establishing the framework.
|
||||||
|
|
||||||
|
First scan cycle planned: late May 2026 (private validation).
|
||||||
|
First public ranking publication target: August 2026 (after 3 validation cycles).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Pre-history
|
||||||
|
|
||||||
|
Project began as "AIO Visibility" module within LMW Pulse SaaS in March 2026. Pivoted to standalone product `citee.ai` in May 2026 after market analysis showed no global competitor publishing public AI visibility rankings (27+ tracked SaaS dashboards but zero public rankings).
|
||||||
|
|
||||||
|
Strategic shift from advisory-board-driven model (Gartner / Forbes 30 Under 30 pattern) to algorithm-first model (Glassdoor / Trustpilot / FIDE / PageRank pattern) decided 2026-05-03 based on principle: "the tool must defend itself, not by authority."
|
||||||
40
LICENSE
Normal file
40
LICENSE
Normal file
|
|
@ -0,0 +1,40 @@
|
||||||
|
MIT License
|
||||||
|
|
||||||
|
Copyright (c) 2026 LMW Commerce / Jacek Kubas
|
||||||
|
|
||||||
|
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||||
|
of this software and associated documentation files (the "Software"), to deal
|
||||||
|
in the Software without restriction, including without limitation the rights
|
||||||
|
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||||
|
copies of the Software, and to permit persons to whom the Software is
|
||||||
|
furnished to do so, subject to the following conditions:
|
||||||
|
|
||||||
|
The above copyright notice and this permission notice shall be included in all
|
||||||
|
copies or substantial portions of the Software.
|
||||||
|
|
||||||
|
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||||
|
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||||
|
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||||
|
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||||
|
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||||
|
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||||
|
SOFTWARE.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
Note on Citee Index data:
|
||||||
|
|
||||||
|
While this methodology is MIT-licensed and freely usable, the Citee Index
|
||||||
|
itself (the published rankings, raw query logs, and brand-level scores) is
|
||||||
|
provided under a separate data license described at
|
||||||
|
https://citee.ai/data-license. The methodology being open does not imply
|
||||||
|
that derived datasets from Citee scans are public domain.
|
||||||
|
|
||||||
|
Disclaimer regarding scoring:
|
||||||
|
|
||||||
|
Citee Index scores represent expressions of opinion based on observed AI
|
||||||
|
model outputs at specific points in time. They are not factual claims about
|
||||||
|
the relative quality, popularity, or merit of any brand. The methodology is
|
||||||
|
a framework for converting observed AI outputs into a comparable index;
|
||||||
|
reasonable people could construct alternative methodologies that produce
|
||||||
|
different rankings.
|
||||||
85
README.md
Normal file
85
README.md
Normal file
|
|
@ -0,0 +1,85 @@
|
||||||
|
# Citee Index Methodology
|
||||||
|
|
||||||
|
> Open methodology for the first public ranking of brand visibility in AI search results.
|
||||||
|
|
||||||
|
[**citee.ai**](https://citee.ai) · [**Methodology page**](https://citee.ai/methodology) · [Forgejo](https://git.lmwcommerce.com/citee/citee-methodology) · [GitHub mirror](https://github.com/lmwcommerce/citee-methodology)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## What this is
|
||||||
|
|
||||||
|
Citee Index measures how brands appear in AI-generated answers across major LLM-powered search systems (ChatGPT with web search, Perplexity, Gemini, Claude). The ranking is published quarterly per category and country.
|
||||||
|
|
||||||
|
This repository contains the **complete public methodology** — formulas, model weights, prompt-type distribution, cross-signal definitions, and the prompt curation process. Every change is committed publicly with rationale.
|
||||||
|
|
||||||
|
**This is NOT:**
|
||||||
|
- A SaaS dashboard (that's [Citee Pro](https://citee.ai/pro), separate product)
|
||||||
|
- A list of paid placements (zero pay-to-play, hard rule in [`methodology.json`](./methodology.json))
|
||||||
|
- A static document — methodology evolves through versioned releases (see [`CHANGELOG.md`](./CHANGELOG.md))
|
||||||
|
|
||||||
|
## Why open
|
||||||
|
|
||||||
|
Three reasons:
|
||||||
|
|
||||||
|
1. **Reproducibility.** Anyone can audit our scoring against the public raw query log.
|
||||||
|
2. **Cryptographic timestamping.** Git history is immutable — we cannot retroactively edit the methodology to hide a bug.
|
||||||
|
3. **Subjective opinion shield.** Open formula + public versioning establishes that scores are "expressions of opinion based on observed AI model outputs," not factual claims (legal precedent: *Gartner v. NetScout*, Connecticut Supreme Court 2020).
|
||||||
|
|
||||||
|
## What's in this repo
|
||||||
|
|
||||||
|
| File | Purpose |
|
||||||
|
|---|---|
|
||||||
|
| [`methodology.json`](./methodology.json) | Machine-readable methodology — formulas, weights, thresholds, policies |
|
||||||
|
| [`CHANGELOG.md`](./CHANGELOG.md) | Version history with rationale for each change |
|
||||||
|
| [`taxonomy.md`](./taxonomy.md) | Category list, tier system, scan cadence per tier |
|
||||||
|
| [`prompts/README.md`](./prompts/README.md) | Prompt curation process (6 stages, multi-agent validation) |
|
||||||
|
| [`prompts/example-*.md`](./prompts/) | Example prompt frameworks per category (illustrative — exact strings remain closed to prevent Goodhart's Law) |
|
||||||
|
| [`tools/prompt_curation/`](./tools/prompt_curation/) | Code for the multi-agent prompt curation pipeline |
|
||||||
|
| [`LICENSE`](./LICENSE) | MIT |
|
||||||
|
|
||||||
|
## What's NOT here (and why)
|
||||||
|
|
||||||
|
Some operational details remain closed:
|
||||||
|
|
||||||
|
- **Exact prompt strings** — disclosing the exact 100 prompts per category would let vendors optimize their pages specifically against our queries (Goodhart's Law). We publish the **distribution by type** (40% buying intent, 25% comparison, 20% specific need, 15% informational, 10% brand-direct) and **example patterns**, not exact strings. 20% of the prompt pool rotates quarterly.
|
||||||
|
- **Anti-gaming thresholds** — specific burst-detection cutoffs, sock puppet karma thresholds, and review-bombing pattern signatures are closed. We publish the categories (rank-jump flag at >30 ranks, fresh-Wikidata excluded <90 days, etc.) but not exact numbers.
|
||||||
|
- **Honeypot brand details** — disclosure would defeat the purpose. The honeypot is documented as existing in [`methodology.json`](./methodology.json) for transparency.
|
||||||
|
- **First-party telemetry from Free Checker** — aggregated weights from this telemetry feed into model weighting, but raw user data remains closed (GDPR).
|
||||||
|
|
||||||
|
These categories of closed information are explicitly listed in [`methodology.json`](./methodology.json) so the boundary between open and closed is itself transparent.
|
||||||
|
|
||||||
|
## Versioning policy
|
||||||
|
|
||||||
|
- **No retroactive changes.** Methodology updates apply to **future cycles only**. If we change the model weighting formula in v1.1, scores for cycles published before v1.1 are not retroactively recomputed (lesson from FIDE 2024 backlash, "stealing rating points").
|
||||||
|
- **Quarterly major reviews + ad-hoc minor patches.** Major reviews happen at the start of each quarter. Minor patches (typos, clarifications, additional examples) anytime — versioned as v1.0.1, v1.0.2, etc.
|
||||||
|
- **Every change has a public commit with rationale.** No silent edits.
|
||||||
|
|
||||||
|
## Citation
|
||||||
|
|
||||||
|
If you cite Citee Index methodology in academic work, journalism, or business reports:
|
||||||
|
|
||||||
|
```
|
||||||
|
Citee Index Methodology v1.0.0 (2026-05-03).
|
||||||
|
LMW Commerce / Citee. https://github.com/lmwcommerce/citee-methodology
|
||||||
|
```
|
||||||
|
|
||||||
|
## Contributing
|
||||||
|
|
||||||
|
Issues welcome — open one if you spot:
|
||||||
|
- Methodological flaws or statistical issues
|
||||||
|
- Errors in formulas or definitions
|
||||||
|
- Missing edge cases in anti-gaming
|
||||||
|
- Documentation typos or unclear sections
|
||||||
|
|
||||||
|
Pull requests considered for documentation, code in `tools/`, and example frameworks. **Methodology changes themselves are decided internally** based on quarterly review + community feedback. Every accepted methodology change is credited in `CHANGELOG.md`.
|
||||||
|
|
||||||
|
## License
|
||||||
|
|
||||||
|
MIT. See [`LICENSE`](./LICENSE).
|
||||||
|
|
||||||
|
You're free to use this methodology, fork it, build on it, replicate it, criticize it. We only ask: if you publish a competing ranking, **don't claim it's reproduced from Citee data without running the formulas yourself.** Methodology is open; our raw query log is the source of truth.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Maintained by:** [LMW Commerce](https://lmwcommerce.com) · Jacek Kubas
|
||||||
|
**Contact:** hello@citee.ai
|
||||||
270
methodology.json
Normal file
270
methodology.json
Normal file
|
|
@ -0,0 +1,270 @@
|
||||||
|
{
|
||||||
|
"version": "1.0.0",
|
||||||
|
"released": "2026-05-03",
|
||||||
|
"name": "Citee Index Methodology",
|
||||||
|
"description": "Public methodology for Citee Index — the first open public ranking of brand visibility in AI search results (ChatGPT, Perplexity, Gemini, Claude).",
|
||||||
|
"license": "MIT",
|
||||||
|
"repository": "https://git.lmwcommerce.com/citee/citee-methodology",
|
||||||
|
"mirror": "https://github.com/lmwcommerce/citee-methodology",
|
||||||
|
"homepage": "https://citee.ai/methodology",
|
||||||
|
|
||||||
|
"philosophy": {
|
||||||
|
"approach": "algorithm-first",
|
||||||
|
"principles": [
|
||||||
|
"Open methodology, public versioning (every change committed publicly)",
|
||||||
|
"Reproducibility — anyone can replicate scores from raw query log",
|
||||||
|
"No pay-to-play — ranked brands never pay Citee directly. Hard rule in ToS.",
|
||||||
|
"Subjective opinion disclaimer — scores are expressions of opinion based on observed AI model outputs (First Amendment shield, Gartner v. NetScout 2020)",
|
||||||
|
"No retroactive changes — methodology updates apply to FUTURE cycles only (FIDE 2024 backlash lesson)",
|
||||||
|
"Confidence intervals — overlapping CIs reported as 'tied', no false precision",
|
||||||
|
"Annual transparency report — manipulation patterns detected, anti-gaming actions taken"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
|
||||||
|
"scoring": {
|
||||||
|
"formula": "CiteeScore(brand, category, country, month) = sum(mention_score_per_model * model_weight) * (1 + cross_signal_bonus)",
|
||||||
|
"normalization": "Raw score 0-120 normalized to 0-100 per category (top brand = 100, others proportional)",
|
||||||
|
"ranking": "Sort by CiteeScore descending. Brands with overlapping confidence intervals reported as tied."
|
||||||
|
},
|
||||||
|
|
||||||
|
"models": {
|
||||||
|
"weighting_basis": "Each model weighted by its share of AI search traffic per region. Weights revised quarterly using 3 public data sources (OpenRouter rankings, Similarweb free tier, Statcounter/IAB Polska/Mobirank reports) plus first-party Free Checker telemetry.",
|
||||||
|
"weights": {
|
||||||
|
"PL": {
|
||||||
|
"chatgpt": {
|
||||||
|
"weight": 0.45,
|
||||||
|
"model_version": "gpt-4o-search-2026-04",
|
||||||
|
"rationale": "Largest user share PL based on OpenRouter + Similarweb data"
|
||||||
|
},
|
||||||
|
"perplexity": {
|
||||||
|
"weight": 0.25,
|
||||||
|
"model_version": "sonar-pro-2026-03",
|
||||||
|
"rationale": "Growing power user segment, search-native architecture"
|
||||||
|
},
|
||||||
|
"gemini": {
|
||||||
|
"weight": 0.20,
|
||||||
|
"model_version": "gemini-2.0-pro",
|
||||||
|
"rationale": "Google embed + AI Overviews coverage"
|
||||||
|
},
|
||||||
|
"claude": {
|
||||||
|
"weight": 0.10,
|
||||||
|
"model_version": "claude-sonnet-2026-q1",
|
||||||
|
"rationale": "Niche but growing, added Q4 2026 in pilot",
|
||||||
|
"status": "added_q4_2026"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"pilot_models": ["chatgpt", "perplexity", "gemini"],
|
||||||
|
"claude_addition_planned": "2026-Q4"
|
||||||
|
},
|
||||||
|
|
||||||
|
"mention_score_per_model": {
|
||||||
|
"formula": "mention_score = (position * 0.4) + (prominence * 0.3) + (sentiment * 0.15) + (citation_depth * 0.15)",
|
||||||
|
"range": "0.0 - 1.0",
|
||||||
|
"components": {
|
||||||
|
"position": {
|
||||||
|
"weight": 0.4,
|
||||||
|
"scale": {
|
||||||
|
"rank_1": 1.0,
|
||||||
|
"rank_2": 0.7,
|
||||||
|
"rank_3": 0.5,
|
||||||
|
"rank_4_to_10": 0.3,
|
||||||
|
"not_mentioned": 0.0
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"prominence": {
|
||||||
|
"weight": 0.3,
|
||||||
|
"scale": {
|
||||||
|
"passing_mention": 0.3,
|
||||||
|
"listed_with_description": 0.6,
|
||||||
|
"actively_recommended": 1.0
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"sentiment": {
|
||||||
|
"weight": 0.15,
|
||||||
|
"scale": {
|
||||||
|
"positive": 0.2,
|
||||||
|
"neutral": 0.0,
|
||||||
|
"negative_or_caveated": -0.3
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"citation_depth": {
|
||||||
|
"weight": 0.15,
|
||||||
|
"scale": {
|
||||||
|
"direct_link_to_brand_site": 1.0,
|
||||||
|
"mention_only_no_link": 0.5
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
|
||||||
|
"prompt_types": {
|
||||||
|
"rationale": "Different prompt types reflect different stages of buyer funnel. Buying intent prompts weighted higher because they correlate with revenue impact.",
|
||||||
|
"weights": {
|
||||||
|
"buying": {
|
||||||
|
"weight": 2.0,
|
||||||
|
"examples_pattern": "Where to buy [category] premium / Best place to buy [category]",
|
||||||
|
"share_of_pool": "30%"
|
||||||
|
},
|
||||||
|
"comparison": {
|
||||||
|
"weight": 1.5,
|
||||||
|
"examples_pattern": "Best [category] / Top [category] handmade / [Brand A] vs [Brand B]",
|
||||||
|
"share_of_pool": "25%"
|
||||||
|
},
|
||||||
|
"specific_need": {
|
||||||
|
"weight": 1.5,
|
||||||
|
"examples_pattern": "[Category] with [specific attribute] / [Category] for [specific use case]",
|
||||||
|
"share_of_pool": "20%"
|
||||||
|
},
|
||||||
|
"informational": {
|
||||||
|
"weight": 0.3,
|
||||||
|
"examples_pattern": "What is [category] / How does [category] work",
|
||||||
|
"share_of_pool": "15%"
|
||||||
|
},
|
||||||
|
"brand_direct": {
|
||||||
|
"weight": 0.3,
|
||||||
|
"examples_pattern": "[Brand X] reviews / Opinions about [Brand X]",
|
||||||
|
"share_of_pool": "10%"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"pool_size_per_category": 100,
|
||||||
|
"pool_rotation": "20% of prompts rotate quarterly. Distribution by type published. Exact strings remain CLOSED to prevent Goodhart's Law (when a measure becomes a target, it ceases to be a measure)."
|
||||||
|
},
|
||||||
|
|
||||||
|
"cross_signals": {
|
||||||
|
"rationale": "Cross-signals provide reality check — does the brand exist outside AI training data? Brand with high AI score but zero cross-signals may indicate content spam farm rather than real entity.",
|
||||||
|
"max_total_bonus": 0.20,
|
||||||
|
"signals": {
|
||||||
|
"wikidata_entry": {
|
||||||
|
"bonus": 0.05,
|
||||||
|
"criteria": "Brand has Wikidata entry, minimum 5 triples (instance_of, country, founder OR founded_date, official_website, ISNI), entry age >= 90 days",
|
||||||
|
"anti_gaming": "Entries < 90 days old excluded to prevent rapid-deployment manipulation"
|
||||||
|
},
|
||||||
|
"trustpilot_or_opineo": {
|
||||||
|
"bonus": 0.05,
|
||||||
|
"criteria": "Reviews count > 50, average rating > 4.0, no review bombing detected (review burst > 50 in 30 days = excluded)"
|
||||||
|
},
|
||||||
|
"reddit_organic_mentions": {
|
||||||
|
"bonus": 0.05,
|
||||||
|
"criteria": "Organic mentions in niche subreddit > 10, account_age + karma weighted, sock puppet detection applied (new accounts < 30 days excluded)"
|
||||||
|
},
|
||||||
|
"google_ai_overviews_presence": {
|
||||||
|
"bonus": 0.05,
|
||||||
|
"criteria": "Brand cited in Google AI Overviews response for at least one tracked prompt in category, verified via SerpAPI"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
|
||||||
|
"anti_gaming": {
|
||||||
|
"public_thresholds": {
|
||||||
|
"rank_jump_flag": "Brand jumping > 30 ranks in single cycle triggers anomaly review and one-cycle score freeze",
|
||||||
|
"fresh_wikidata_excluded": "< 90 days",
|
||||||
|
"review_bombing_excluded": "> 50 reviews in 30 days from new accounts",
|
||||||
|
"sock_puppet_excluded": "Reddit accounts < 30 days old or karma < threshold"
|
||||||
|
},
|
||||||
|
"private_thresholds": {
|
||||||
|
"rationale": "Specific burst detection thresholds, sock puppet karma cutoffs, and pattern matching rules remain CLOSED to prevent gaming. Available to legal/regulatory authorities upon request.",
|
||||||
|
"categories": [
|
||||||
|
"burst_detection_thresholds",
|
||||||
|
"sock_puppet_karma_cutoffs",
|
||||||
|
"review_bombing_pattern_signatures",
|
||||||
|
"prompt_injection_detection_signatures"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"honeypot_brand": {
|
||||||
|
"active": true,
|
||||||
|
"rationale": "Fictional brand inserted at predetermined ranking position to detect AI training data circular logic and unauthorized scraping. If model cites honeypot brand, evidence of training on Citee data without attribution.",
|
||||||
|
"details": "CLOSED — disclosure would defeat purpose"
|
||||||
|
},
|
||||||
|
"prompt_injection_defense": {
|
||||||
|
"scrape_filters": [
|
||||||
|
"Strip CSS hidden text (display:none, visibility:hidden, color:white-on-white)",
|
||||||
|
"Strip off-screen positioned content (left:-9999px, etc.)",
|
||||||
|
"Strip font-size:0 and opacity:0 elements",
|
||||||
|
"Detect and exclude content in noscript that contradicts visible content"
|
||||||
|
],
|
||||||
|
"consequence": "Brands using prompt injection excluded from current cycle + publicly named in annual transparency report"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
|
||||||
|
"statistical_methodology": {
|
||||||
|
"queries_per_cycle": {
|
||||||
|
"prompts_per_category": 100,
|
||||||
|
"models": "3 in pilot (ChatGPT, Perplexity, Gemini), 4 from Q4 2026 (+ Claude)",
|
||||||
|
"repetitions_per_prompt": 2,
|
||||||
|
"total_per_category_per_cycle": "100 * 3 * 2 = 600 (pilot), 100 * 4 * 2 = 800 (post Q4 2026)"
|
||||||
|
},
|
||||||
|
"confidence_intervals": "95% CI computed via bootstrap resampling. Brands with overlapping CIs reported as tied — no false precision.",
|
||||||
|
"minimum_brands_per_category": 20,
|
||||||
|
"tied_score_handling": "If CI(A) overlaps CI(B), both reported at same rank with '=' indicator"
|
||||||
|
},
|
||||||
|
|
||||||
|
"scan_cadence": {
|
||||||
|
"tier_1_large_markets": {
|
||||||
|
"frequency": "monthly",
|
||||||
|
"criteria": ">1000 brands visible, >100M PLN GMV"
|
||||||
|
},
|
||||||
|
"tier_2_medium_markets": {
|
||||||
|
"frequency": "quarterly",
|
||||||
|
"criteria": "100-1000 brands, 10-100M PLN GMV"
|
||||||
|
},
|
||||||
|
"tier_3_niche_markets": {
|
||||||
|
"frequency": "semi-annually",
|
||||||
|
"criteria": "<100 brands, <10M PLN GMV"
|
||||||
|
},
|
||||||
|
"current_pilot_tier": "all categories in pilot are Tier 2 (quarterly)"
|
||||||
|
},
|
||||||
|
|
||||||
|
"publication_policy": {
|
||||||
|
"validation_period_before_first_publication": "3 months / 3 cycles minimum",
|
||||||
|
"first_public_ranking": "August 2026 (target)",
|
||||||
|
"format": "Hybrid — Top 10 public HTML (SEO indexed), full ranking 100 brands as PDF behind email gate",
|
||||||
|
"ai_crawler_policy": {
|
||||||
|
"robots_txt_disallow": ["GPTBot", "ClaudeBot", "PerplexityBot", "CCBot", "Google-Extended"],
|
||||||
|
"endpoints_protected": ["/api/ranking-full", "/index/*/full.pdf"],
|
||||||
|
"rationale": "Prevents AI training data circular logic. Hybrid approach (top 10 public, ogon protected) balances SEO with measurement integrity."
|
||||||
|
},
|
||||||
|
"right_to_reply": "Each brand profile page includes 'Brand response' section. Brands can submit response (moderated for factual accuracy) within 30 days of cycle publication."
|
||||||
|
},
|
||||||
|
|
||||||
|
"monetization_policy": {
|
||||||
|
"ranked_brands_pay_zero": true,
|
||||||
|
"rationale": "Issuer-pays model fundamentally compromises ranking credibility (Moody's $864M settlement, Forbes 30 Under 30 fraud roundup). Citee Index revenue comes from indirect channels only.",
|
||||||
|
"approved_revenue_sources": [
|
||||||
|
"Citee Pro SaaS (199-449 PLN/mo) — paid by shops optimizing their visibility, NOT by ranked brands",
|
||||||
|
"Industry Reports (999-2999 PLN/quarter) — paid by agencies, media, market research firms",
|
||||||
|
"Sponsored Custom Research (9990-29990 PLN) — commissioned by media/agency for category research, NOT brand-specific"
|
||||||
|
],
|
||||||
|
"prohibited": [
|
||||||
|
"Brand profile upgrades (paid premium listing)",
|
||||||
|
"Verified badges (annual fee for ranking participation)",
|
||||||
|
"Awards sponsored by ranked brands",
|
||||||
|
"Any direct payment from ranked entity to Citee"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
|
||||||
|
"categories_pilot_2026": {
|
||||||
|
"country": "PL",
|
||||||
|
"tier": "Tier 2 (quarterly scan)",
|
||||||
|
"list": [
|
||||||
|
"kosmetyki-naturalne",
|
||||||
|
"suplementy-nutricosmetyki",
|
||||||
|
"diety-pudelkowe",
|
||||||
|
"premium-pet-food",
|
||||||
|
"kawa-specialty",
|
||||||
|
"czekolada-rzemieslnicza",
|
||||||
|
"kursy-programowania-bootcampy",
|
||||||
|
"kliniki-estetyczne-dermo",
|
||||||
|
"fitness-studios-premium",
|
||||||
|
"kosmetyki-meskie",
|
||||||
|
"swiece-sojowe"
|
||||||
|
],
|
||||||
|
"expansion_plan": {
|
||||||
|
"Q3_2026": "Add Tier 1 PL categories (kosmetyki ogólne, odzież dziecięca, dom & ogród, elektronika audio, biuro)",
|
||||||
|
"Q4_2026": "DACH expansion — pilot 5 categories DE",
|
||||||
|
"2027_Q1": "CEE expansion (CZ, SK, HU, RO)"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
|
||||||
|
"changelog_reference": "See CHANGELOG.md for version history. Methodology evolves through public commits with rationale. NO retroactive changes — modifications apply to FUTURE cycles only."
|
||||||
|
}
|
||||||
172
prompts/README.md
Normal file
172
prompts/README.md
Normal file
|
|
@ -0,0 +1,172 @@
|
||||||
|
# Prompt Curation Process
|
||||||
|
|
||||||
|
> How Citee Index builds and validates the prompt pool per category. The 6-stage process that prevents the "garbage in, garbage out" failure mode.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Why this matters
|
||||||
|
|
||||||
|
If the prompt pool is junk ("dyfuzory do włosów ranking", "wąski do samochodu"), the ranking is junk. Prompt quality is the single most important upstream input to ranking integrity.
|
||||||
|
|
||||||
|
This process exists to ensure every prompt in the active pool meets two tests:
|
||||||
|
|
||||||
|
1. **Real buyer test** — would an actual buyer of this category type this query into ChatGPT/Perplexity?
|
||||||
|
2. **Reality check** — does this query appear in actual search/discussion data (Google Trends, Reddit, Quora)?
|
||||||
|
|
||||||
|
Prompts failing either test are excluded.
|
||||||
|
|
||||||
|
## The 6 stages
|
||||||
|
|
||||||
|
```
|
||||||
|
Stage 1: Persona Generator (AI)
|
||||||
|
↓ 5–10 buyer personas per category
|
||||||
|
Stage 2: Prompt Brainstormer (AI per persona)
|
||||||
|
↓ 200–300 raw prompts
|
||||||
|
Stage 3: Reality Check (Google Trends / Reddit / Quora / AnswerThePublic)
|
||||||
|
↓ ~150 prompts with verified search demand
|
||||||
|
Stage 4: Multi-agent Validation (3 critic agents in parallel)
|
||||||
|
↓ ~120 prompts after critique
|
||||||
|
Stage 5: Pilot Test Run (10-prompt sample × 3 models)
|
||||||
|
↓ ~110 prompts that produce stable, sensible AI outputs
|
||||||
|
Stage 6: Human Approval (founder + category expert)
|
||||||
|
↓ FINAL POOL: 100 prompts
|
||||||
|
```
|
||||||
|
|
||||||
|
### Stage 1 — Persona Generator
|
||||||
|
|
||||||
|
Claude generates 5–10 buyer personas per category. Each persona has:
|
||||||
|
- Demographics (age, location, income bracket)
|
||||||
|
- Pain points (what they're trying to solve)
|
||||||
|
- Decision factors (price, ingredients, brand, reviews, certifications)
|
||||||
|
- Vocabulary (how they actually talk — formal vs colloquial, technical vs lay)
|
||||||
|
|
||||||
|
Example for Świece sojowe PL:
|
||||||
|
- "30+ kobieta kupująca prezent dla mamy"
|
||||||
|
- "Self-care millennial 25–35 po pracy"
|
||||||
|
- "Wnętrzarz minimalistyczne mieszkanie"
|
||||||
|
- "Mężczyzna kupujący prezent walentynkowy"
|
||||||
|
- "Mama małych dzieci szukająca bezpiecznego zapachu"
|
||||||
|
|
||||||
|
### Stage 2 — Prompt Brainstormer
|
||||||
|
|
||||||
|
For each persona, Claude generates 30–50 prompts in the voice of that persona — "how would I phrase this question to ChatGPT?" Total per category: ~200–300 raw prompts.
|
||||||
|
|
||||||
|
Distribution target by type (enforced at this stage):
|
||||||
|
- Buying intent (weight 2.0): 30%
|
||||||
|
- Comparison (weight 1.5): 25%
|
||||||
|
- Specific need (weight 1.5): 20%
|
||||||
|
- Informational (weight 0.3): 15%
|
||||||
|
- Brand-direct (weight 0.3): 10%
|
||||||
|
|
||||||
|
### Stage 3 — Reality Check
|
||||||
|
|
||||||
|
Each prompt cross-referenced against real-world data:
|
||||||
|
|
||||||
|
| Source | Method | Threshold |
|
||||||
|
|---|---|---|
|
||||||
|
| **Google Trends API** | PL queries past 12 months | minimum search volume present |
|
||||||
|
| **Google Search Console** (where available) | Real search queries to brand sites we have access to | inspirational source for vocabulary |
|
||||||
|
| **Reddit search** | r/Polska_Marka, niche subreddits | actual user phrasing |
|
||||||
|
| **Quora PL** | Questions asked in category | real curiosity patterns |
|
||||||
|
| **AnswerThePublic** | Public scraping of "people also ask" | discovery of long-tail patterns |
|
||||||
|
| **People Also Ask (Google)** | For top category queries | semantic neighbors |
|
||||||
|
|
||||||
|
Prompts with zero/marginal real-world signal are removed. ~300 → ~150.
|
||||||
|
|
||||||
|
### Stage 4 — Multi-agent Validation
|
||||||
|
|
||||||
|
Three AI critic agents review the list in parallel:
|
||||||
|
|
||||||
|
**Agent A — "Real buyer critique"**
|
||||||
|
Persona-grounded review. Each persona "reads" the prompts and flags ones that don't sound natural for that persona. Prompts marked unnatural by 2+ personas are removed.
|
||||||
|
|
||||||
|
**Agent B — "Methodology critic"**
|
||||||
|
Statistical and structural review. Checks:
|
||||||
|
- Prompt type distribution stays within ±5% of target
|
||||||
|
- No subcategory over/under-represented
|
||||||
|
- Vocabulary diversity (we're not repeating the same phrasing)
|
||||||
|
- Length distribution reasonable (no 50-word prompts, no 2-word prompts)
|
||||||
|
|
||||||
|
**Agent C — "Vendor exploit hunter"**
|
||||||
|
Anti-gaming review. Identifies prompts that are too easy to game by content marketing fluff:
|
||||||
|
- Generic informational queries that any vendor can write a blog post for
|
||||||
|
- Prompts where AI answer is dominated by Wikipedia (vendor can edit Wikipedia)
|
||||||
|
- Prompts where answer comes from one Reddit post (vendor can write that post)
|
||||||
|
|
||||||
|
Each agent produces a list of flagged prompts. Anything flagged by 2+ agents is removed. ~150 → ~120.
|
||||||
|
|
||||||
|
### Stage 5 — Pilot Test Run
|
||||||
|
|
||||||
|
The ~120 candidate prompts get a sample test:
|
||||||
|
- Pick 10 prompts (stratified across types)
|
||||||
|
- Run on ChatGPT-search, Perplexity Sonar, Gemini Pro
|
||||||
|
- Each prompt × 3 models = 30 outputs
|
||||||
|
|
||||||
|
**Reject criteria:**
|
||||||
|
- AI returns "I don't know" or "this depends on your preferences" (no actionable brand mentions)
|
||||||
|
- Outputs across 3 models have zero overlap (prompt produces incoherent/random results)
|
||||||
|
- AI returns a list of countries/categories instead of brands (prompt was misinterpreted)
|
||||||
|
|
||||||
|
Prompts failing pilot are flagged for revision or removal. ~120 → ~110.
|
||||||
|
|
||||||
|
### Stage 6 — Human Approval
|
||||||
|
|
||||||
|
The founder + category expert review the final ~110 candidates and select the production 100.
|
||||||
|
|
||||||
|
**Founder always reviews.** For categories outside founder's domain knowledge, a paid expert reviewer (1–2 hours, $50–100) is engaged:
|
||||||
|
|
||||||
|
| Category | Expert profile |
|
||||||
|
|---|---|
|
||||||
|
| Kosmetyki naturalne | Beauty product manager / freelance marketer |
|
||||||
|
| Suplementy / nutricosmetyki | Nutritionist / DTC supplement marketer |
|
||||||
|
| Diety pudełkowe | Fitness coach / dietitian |
|
||||||
|
| Premium pet food | Pet specialty store owner / dog trainer |
|
||||||
|
| Kawa specialty | Coffee blogger / barista trainer |
|
||||||
|
| Czekolada rzemieślnicza | Food blogger / chocolate-focused content creator |
|
||||||
|
| Kursy programowania | Bootcamp graduate / hiring manager |
|
||||||
|
| Kliniki estetyczne | Dermatologist or aesthetic medicine consultant |
|
||||||
|
| Fitness studios | Personal trainer / gym manager |
|
||||||
|
| Kosmetyki męskie | Men's grooming influencer / DTC marketer |
|
||||||
|
| Świece sojowe | Founder + JAKULO customer service data |
|
||||||
|
|
||||||
|
The final 100 prompts are committed to the closed `prompts/{slug}/` directory (gitignored). A public example framework is committed to `prompts/example-{slug}.md` (this repo) showing the structure and 5–10 illustrative examples per type — but **not the exact production strings**.
|
||||||
|
|
||||||
|
## Quarterly refresh — 20% rotation
|
||||||
|
|
||||||
|
Every quarter, the curation pipeline runs in refresh mode:
|
||||||
|
|
||||||
|
1. **Trend check** — Google Trends API: which prompts have lost relative search volume?
|
||||||
|
2. **New patterns** — Reddit/Quora scrape: what new question patterns have emerged?
|
||||||
|
3. **New entrants** — scan model outputs from past quarter: what brands appeared in answers but aren't in our brand catalog?
|
||||||
|
4. **Generate replacements** — Stages 1–5 for the rotation set
|
||||||
|
5. **Human approval** — founder reviews the proposed 20 swaps in 5–10 minutes
|
||||||
|
|
||||||
|
This prevents Goodhart's Law: as the prompt pool becomes known to vendors (through reverse-engineering or leaks), 20% rotation per quarter ensures vendors can't permanently optimize against our exact queries.
|
||||||
|
|
||||||
|
## Cost per category
|
||||||
|
|
||||||
|
| Stage | API cost | Human cost |
|
||||||
|
|---|---|---|
|
||||||
|
| 1 — Persona Generator | ~$0.50 (Claude) | — |
|
||||||
|
| 2 — Prompt Brainstormer | ~$1.50 (Claude) | — |
|
||||||
|
| 3 — Reality Check | $0 (free APIs) | — |
|
||||||
|
| 4 — Multi-agent Validation | ~$3 (Claude × 3 critics) | — |
|
||||||
|
| 5 — Pilot Test Run | ~$5 (3 models × 30 outputs) | — |
|
||||||
|
| 6 — Human Approval | — | ~30 min founder + 1–2h expert ($50–100 for non-founder categories) |
|
||||||
|
| **Total per category** | **~$10** | **~30 min + $50–100 for expert categories** |
|
||||||
|
|
||||||
|
For 11 pilot categories: ~$110 API + ~5 hours founder time + ~$500 expert reviewers.
|
||||||
|
|
||||||
|
## Quarterly refresh cost
|
||||||
|
|
||||||
|
Per category per quarter: ~$3 API + 5 minutes founder review.
|
||||||
|
|
||||||
|
For 11 categories: ~$35 API + 1 hour founder time per quarter.
|
||||||
|
|
||||||
|
## Why this is published openly
|
||||||
|
|
||||||
|
We publish the **process** because the integrity of the ranking depends on the integrity of the prompts, and external review of the process is the strongest defense against "your prompts are garbage" attack.
|
||||||
|
|
||||||
|
We do NOT publish the **exact strings** because Goodhart's Law: known prompts get optimized against, ceasing to measure organic AI search behavior.
|
||||||
|
|
||||||
|
The boundary between "open process" and "closed strings" is itself documented openly.
|
||||||
100
prompts/example-swiece-sojowe-pl.md
Normal file
100
prompts/example-swiece-sojowe-pl.md
Normal file
|
|
@ -0,0 +1,100 @@
|
||||||
|
# Example prompt framework — Świece sojowe PL
|
||||||
|
|
||||||
|
> Illustrative framework showing how a category prompt pool is structured. Exact production strings remain CLOSED in `prompts/swiece-sojowe-pl/` (gitignored).
|
||||||
|
|
||||||
|
This document is **public** to demonstrate the curation process and prompt-type distribution. It is **not** the actual production prompt list.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Distribution
|
||||||
|
|
||||||
|
100 prompts total, distributed by type:
|
||||||
|
|
||||||
|
| Type | Count | Weight | Share |
|
||||||
|
|---|---|---|---|
|
||||||
|
| Buying intent | 30 | 2.0 | 30% |
|
||||||
|
| Comparison | 25 | 1.5 | 25% |
|
||||||
|
| Specific need | 20 | 1.5 | 20% |
|
||||||
|
| Informational | 15 | 0.3 | 15% |
|
||||||
|
| Brand-direct | 10 | 0.3 | 10% |
|
||||||
|
|
||||||
|
## Personas referenced
|
||||||
|
|
||||||
|
- "30+ kobieta kupująca prezent dla mamy"
|
||||||
|
- "Self-care millennial 25–35 po pracy"
|
||||||
|
- "Wnętrzarz minimalistyczne mieszkanie"
|
||||||
|
- "Mężczyzna kupujący prezent walentynkowy"
|
||||||
|
- "Mama małych dzieci szukająca bezpiecznego zapachu"
|
||||||
|
- "Eko-świadomy konsument 30+"
|
||||||
|
- "Hostess kupująca świece dla agroturystyki"
|
||||||
|
|
||||||
|
## Buying intent (30 prompts × 2.0 weight) — illustrative examples
|
||||||
|
|
||||||
|
These prompts signal active purchase intent. Highest weight because they correlate directly with revenue impact for ranked brands.
|
||||||
|
|
||||||
|
- "Gdzie kupić premium ręcznie robioną świecę sojową na prezent dla mamy"
|
||||||
|
- "Polska marka świec sojowych z certyfikatem ekologicznym do 200 zł"
|
||||||
|
- "Świeca sojowa w eleganckim opakowaniu jako prezent firmowy"
|
||||||
|
- "Gdzie zamówić zestaw prezentowy z polskich świec sojowych handmade"
|
||||||
|
- *(...26 more, exact strings closed)*
|
||||||
|
|
||||||
|
## Comparison (25 prompts × 1.5 weight) — illustrative examples
|
||||||
|
|
||||||
|
Decision-stage queries. User is comparing brands or making a choice.
|
||||||
|
|
||||||
|
- "JAKULO vs Naturaodpauli — która polska marka świec sojowych lepsza"
|
||||||
|
- "Najlepsze polskie świece sojowe handmade 2026 ranking"
|
||||||
|
- "Polskie świece sojowe premium — porównanie najpopularniejszych marek"
|
||||||
|
- *(...22 more, exact strings closed)*
|
||||||
|
|
||||||
|
## Specific need (20 prompts × 1.5 weight) — illustrative examples
|
||||||
|
|
||||||
|
Specific use cases or attributes — buyer knows what they want.
|
||||||
|
|
||||||
|
- "Świeca sojowa o zapachu wanilii i bursztynu w średnim rozmiarze"
|
||||||
|
- "Długo paląca naturalna świeca sojowa do sypialni 60 godzin"
|
||||||
|
- "Świeca sojowa bezzapachowa dla osoby z alergią na zapachy"
|
||||||
|
- *(...17 more, exact strings closed)*
|
||||||
|
|
||||||
|
## Informational (15 prompts × 0.3 weight) — illustrative examples
|
||||||
|
|
||||||
|
Research-stage queries. Lower weight because easily gamed by content marketing fluff.
|
||||||
|
|
||||||
|
- "Czym różni się świeca sojowa od parafinowej"
|
||||||
|
- "Jak rozpoznać prawdziwie sojową świecę"
|
||||||
|
- "Czy świece sojowe są zdrowe i bezpieczne"
|
||||||
|
- *(...12 more, exact strings closed)*
|
||||||
|
|
||||||
|
## Brand-direct (10 prompts × 0.3 weight) — illustrative examples
|
||||||
|
|
||||||
|
Direct brand queries. Lower weight because brand winning queries about itself = baseline expectation, not value-add.
|
||||||
|
|
||||||
|
- "JAKULO opinie 2026 czy warto kupować"
|
||||||
|
- "Co sądzą o polskiej marce świec Naturaodpauli"
|
||||||
|
- *(...8 more, exact strings closed)*
|
||||||
|
|
||||||
|
## Anti-patterns (excluded)
|
||||||
|
|
||||||
|
The following types of prompts are explicitly excluded during Stage 4 (Vendor exploit hunter critic):
|
||||||
|
|
||||||
|
| Pattern | Reason | Example |
|
||||||
|
|---|---|---|
|
||||||
|
| Single-word | No buyer intent, ambiguous | "świeczki", "świece" |
|
||||||
|
| Hobbystyczny / DIY | Off-topic for retail | "DIY świece sojowe w domu" |
|
||||||
|
| B2B retail | Not consumer-facing | "hurtownia świec sojowych Warszawa" |
|
||||||
|
| Brand-agnostic generic | Easy content marketing target | "co to świeca sojowa" |
|
||||||
|
| Price-only without category context | Too vague | "tania świeca" |
|
||||||
|
| Off-topic technicality | Detection of hobby-craft, not retail | "knot bawełniany do świec wymiary" |
|
||||||
|
| Polish typos at scale | Not real query patterns | "swieczka sojova" (single typo OK if frequent in real data) |
|
||||||
|
|
||||||
|
## Quarterly rotation policy
|
||||||
|
|
||||||
|
Each quarter, 20 prompts (20% of pool) are rotated:
|
||||||
|
- 10 retired (lowest real-world search signal in past 90 days, OR known to be gamed)
|
||||||
|
- 10 added (new patterns from Reddit/Quora/trends, new persona refinements, new product attributes emerging)
|
||||||
|
|
||||||
|
Rotation log is committed to `prompts/swiece-sojowe-pl/rotation_log.md` (closed) with rationale per swap.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**This framework is illustrative.** The actual 100 production prompts evolve with each quarterly cycle and are not published as exact strings — only the distribution, personas, anti-patterns, and example patterns are public.
|
||||||
102
taxonomy.md
Normal file
102
taxonomy.md
Normal file
|
|
@ -0,0 +1,102 @@
|
||||||
|
# Citee Index Taxonomy
|
||||||
|
|
||||||
|
> Category list, tier system, and scan cadence per tier. Live document — updated as new categories are added or existing ones are reclassified.
|
||||||
|
|
||||||
|
**Last updated:** 2026-05-03 (v1.0.0)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Tier system
|
||||||
|
|
||||||
|
Categories are classified by **market depth** (number of visible brands) and **GMV** (PLN annual e-commerce volume in category).
|
||||||
|
|
||||||
|
| Tier | Criteria | Scan cadence | Brands tracked per cycle |
|
||||||
|
|---|---|---|---|
|
||||||
|
| **Tier 1 — Large** | >1000 brands visible, >100M PLN GMV | Monthly | 100 |
|
||||||
|
| **Tier 2 — Medium** | 100–1000 brands, 10–100M PLN GMV | Quarterly | 50–100 |
|
||||||
|
| **Tier 3 — Niche** | <100 brands, <10M PLN GMV | Semi-annual | 20–50 |
|
||||||
|
|
||||||
|
Cross-cutting categories (e.g., "Polish DTC brands Top 100", "Polish handmade Top 100") are published **annually** as flagship reports.
|
||||||
|
|
||||||
|
## Country coverage
|
||||||
|
|
||||||
|
| Country | Status | First publication |
|
||||||
|
|---|---|---|
|
||||||
|
| **PL** (Poland) | Pilot — 11 categories | August 2026 |
|
||||||
|
| **DE** (Germany) | Planned Q4 2026 | — |
|
||||||
|
| **AT** (Austria) | Planned with DE | — |
|
||||||
|
| **CH** (Switzerland) | Planned with DE | — |
|
||||||
|
| **CZ** (Czech Republic) | Planned Q1 2027 | — |
|
||||||
|
| **SK** (Slovakia) | Planned Q1 2027 | — |
|
||||||
|
| **HU** (Hungary) | Planned Q2 2027 | — |
|
||||||
|
| **RO** (Romania) | Planned Q2 2027 | — |
|
||||||
|
| **DK / SE / NO / FI** (Nordic) | Planned Q3 2027 | — |
|
||||||
|
| FR / ES / IT | Year 2 | — |
|
||||||
|
| UK / US (English-speaking) | **Not in roadmap.** Heavy competition (Profound, Otterly, AthenaHQ). Citee focuses on markets where local language and brand knowledge create defensible moat. | — |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## PL — pilot categories (Tier 2)
|
||||||
|
|
||||||
|
All 11 launch categories scan quarterly. Selection criteria: each ranked brand is a potential Citee Pro customer (DTC consumer brands or service businesses with marketing budgets), zero overlap with B2B SaaS competitors of LMW Commerce.
|
||||||
|
|
||||||
|
| # | Category slug | Display name | Sample brands | Notes |
|
||||||
|
|---|---|---|---|---|
|
||||||
|
| 1 | `kosmetyki-naturalne-pl` | Kosmetyki naturalne | Resibo, Tołpa, Yope, Bielenda, Dr Irena Eris, Vianek, Lirene | Premium DTC, brand-conscious vendors |
|
||||||
|
| 2 | `suplementy-nutricosmetyki-pl` | Suplementy / nutricosmetyki | Olimp, Trec, OstroVit, Allnutrition, Health Labs Care, NaturDay, Pharmovit | DTC growth segment, marketing-heavy |
|
||||||
|
| 3 | `diety-pudelkowe-pl` | Diety pudełkowe / catering dietetyczny | Maczfit, Nice To Fit You, Fit&Easy, BistroBox, Light Box | Subscription DTC, high LTV |
|
||||||
|
| 4 | `premium-pet-food-pl` | Premium pet food / akcesoria | Brit Care, Acana, Animonda, Royal Canin, Josera, Belcando, Pies Pisany | Loyal customers, premium pricing |
|
||||||
|
| 5 | `kawa-specialty-pl` | Kawa specialty / gourmet | Coffee Plant, Bonjour Cafe, Etno Cafe, Hard Beans, Coffeedesk, Cafezal | Vocal community, Reddit-rich |
|
||||||
|
| 6 | `czekolada-rzemieslnicza-pl` | Czekolada rzemieślnicza / premium | Manufaktura Czekolady, Mount Blanc, Wawel premium, Ujejski, Wedel exclusive | Premium DTC lifestyle |
|
||||||
|
| 7 | `kursy-programowania-bootcampy-pl` | Kursy programowania / IT bootcampy | Kodilla, Coders Lab, Boring Owl, Future Collars, SDA, WSB Online, Akademia Górska | High-LTV edutech, growing 2026 |
|
||||||
|
| 8 | `kliniki-estetyczne-dermo-pl` | Kliniki estetyczne / dermatologia premium | Klinika La Perla, Medilumi, Klinika Holistic, Estetica, Dermika | Service business, leadgen-driven |
|
||||||
|
| 9 | `fitness-studios-premium-pl` | Fitness studios / personal training | Calypso, Pure Jatomi, Fabric Health Club, niche premium studios | Subscription model, vocal community |
|
||||||
|
| 10 | `kosmetyki-meskie-pl` | Kosmetyki dla mężczyzn / męska pielęgnacja | Bartholomew, Chytry Lis, Onlomen, Ziaja Yego, Nivea Men premium, Lirene Men | Growing 2026 segment, new DTC entrants |
|
||||||
|
| 11 | `swiece-sojowe-pl` | Świece sojowe | JAKULO, Naturaodpauli, Bookiet, Triny, Aromatowo, Yush, Oskiknot, LemonGlas, Paleta Smaków, Bennovate | Pilot test bed + LMW founder pet category |
|
||||||
|
|
||||||
|
## Excluded from pilot — explicit rationale
|
||||||
|
|
||||||
|
| Category | Why excluded |
|
||||||
|
|---|---|
|
||||||
|
| Wino / alkohol PL | Polish "Ustawa o wychowaniu w trzeźwości" art. 13¹ restricts alcohol advertising — regulatory risk too high |
|
||||||
|
| Rękodzieło / handmade / Etsy crowd | Margins 20-30%, micro-businesses won't pay 449 PLN/mo for visibility tools |
|
||||||
|
| B2B SaaS (CRM, marketing automation, e-commerce platforms) | LMW Commerce competes in adjacent space — these vendors won't pay Citee competitor; also they have own visibility tools |
|
||||||
|
| Hosting / domeny | Vendors with own marketing teams, low conversion to Pro SaaS |
|
||||||
|
| Banki / ubezpieczenia / fintech B2B | Buyers of reports (agencies) but ranked brands won't buy Pro — banks have enterprise marketing tools |
|
||||||
|
|
||||||
|
## Q3-Q4 2026 expansion candidates (Tier 1, monthly scan)
|
||||||
|
|
||||||
|
To be added after pilot validation:
|
||||||
|
|
||||||
|
- Kosmetyki ogólne (mainstream, not just naturalne) — mega rynek
|
||||||
|
- Odzież dziecięca DTC
|
||||||
|
- Dom & ogród / wyposażenie wnętrz
|
||||||
|
- Elektronika audio premium
|
||||||
|
- Akcesoria biurowe / papiernicze B2B-light
|
||||||
|
|
||||||
|
## Cross-cutting flagship reports (annual)
|
||||||
|
|
||||||
|
- "Citee Index — Polski DTC e-commerce Top 100" (year-end)
|
||||||
|
- "Citee Index — Polski handmade Top 50" (year-end)
|
||||||
|
- "Citee Index — Polski D2C lifestyle ecosystem" (mid-year)
|
||||||
|
|
||||||
|
These reports cut across categories to identify the strongest brand presences in AI search overall, regardless of vertical.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Adding a new category — checklist
|
||||||
|
|
||||||
|
Before adding a category to active scan, the following must be true:
|
||||||
|
|
||||||
|
1. **Market depth** — at least 20 brands with internet presence visible in PL e-commerce
|
||||||
|
2. **AI search demand** — Google Trends data confirms users search for category-related queries
|
||||||
|
3. **Buyer profile** — ranked brands fit potential Citee Pro customer persona OR clear agency/media buyer for reports
|
||||||
|
4. **No regulatory risk** — category is not subject to advertising restrictions (alcohol, gambling, prescription pharma, etc.)
|
||||||
|
5. **Prompt curation feasible** — buyer personas identifiable, decision factors articulable, expert reviewer available if outside founder's domain knowledge
|
||||||
|
6. **Category integrated with brand catalog** — minimum 30 brands cataloged with normalized names (handling variations: "JAKULO" vs "Jakulo" vs "jakulo.pl")
|
||||||
|
|
||||||
|
When all 6 are true, category enters the pilot validation cycle (3 cycles minimum before public publication).
|
||||||
|
|
||||||
|
## Versioning
|
||||||
|
|
||||||
|
Changes to this taxonomy are tracked in [CHANGELOG.md](./CHANGELOG.md). Adding a new category, reclassifying tier, or removing a category constitutes a MINOR version bump. Adding a new country or fundamentally revising the tier system is a MAJOR version bump.
|
||||||
Loading…
Reference in a new issue