citee-methodology/tools/prompt_curation/README.md

# Prompt Curation Pipeline

> Multi-stage pipeline for curating production prompts per category. Translates the 6-stage methodology process from `prompts/README.md` into runnable code.

---

## Pipeline overview

```
1_persona_generator.py     →  data/{category}/personas.json
2_prompt_brainstormer.py   →  data/{category}/raw_prompts.json
3_reality_checker.py       →  data/{category}/validated_prompts.json
4_validation_agents.py     →  data/{category}/critic_review.json
5_pilot_test_runner.py     →  data/{category}/pilot_test_results.json
6_human_review_export.py   →  data/{category}/for_human_review.csv
7_finalize.py              →  prompts/{category}/v{N}.json (CLOSED)
```

Each stage is idempotent and can be re-run with cached intermediate outputs.

## Tech stack

- **Python 3.12+**
- **Anthropic SDK** (`anthropic>=0.50.0`) — Claude for persona generation, brainstorming, critic agents
- **OpenAI SDK** (`openai>=1.50.0`) — GPT-4o-search for pilot test runs
- **Google Generative AI** (`google-generativeai>=0.8.0`) — Gemini for pilot test runs
- **httpx** for Perplexity API
- **pandas** for CSV export to human reviewer
- **pytrends** for Google Trends API (free, unofficial)
- **praw** for Reddit search (requires Reddit OAuth app)

## Usage

```bash
# 1. Set up environment variables (see .env.example)
cp .env.example .env
# Edit .env with API keys

# 2. Run pipeline for a category
python pipeline.py --category swiece-sojowe-pl

# Or run individual stages
python 1_persona_generator.py --category swiece-sojowe-pl
python 2_prompt_brainstormer.py --category swiece-sojowe-pl
# ... etc.

# 3. After Stage 6, review CSV manually + approve in human_review tool
python 6_human_review_export.py --category swiece-sojowe-pl
# Open data/{category}/for_human_review.csv in spreadsheet
# Mark approved/rejected/edited
# Save back as for_human_review_decided.csv

# 4. Finalize
python 7_finalize.py --category swiece-sojowe-pl
# Outputs: prompts/{category}/v1.json (gitignored, closed)
```

## Cost per category (estimated)

| Stage | API used | Cost |
|---|---|---|
| 1 — Persona Generator | Claude Sonnet | ~$0.50 |
| 2 — Prompt Brainstormer | Claude Sonnet | ~$1.50 |
| 3 — Reality Checker | Free APIs (Trends, Reddit, Quora) | $0 |
| 4 — Validation Agents (3 critics) | Claude Sonnet × 3 | ~$3 |
| 5 — Pilot Test Runner (10 prompts × 3 models) | GPT-4o + Perplexity + Gemini | ~$5 |
| 6 — Human Review Export | (no API) | $0 |
| 7 — Finalize | (no API) | $0 |
| **TOTAL** | | **~$10** |

For 11 pilot categories: ~$110.

## Configuration

See `config.py` for tunable parameters per stage:
- Number of personas (default: 7)
- Prompts per persona (default: 30)
- Type distribution targets (40/25/20/15/10 weights → buying/comparison/specific/info/brand-direct)
- Pilot sample size (default: 10)
- Critic agent thresholds (flagged-by-N agents → remove)

## Quarterly rotation mode

```bash
python pipeline.py --category swiece-sojowe-pl --mode rotation
```

In rotation mode:
- Reads existing `prompts/{category}/v{N}.json`
- Identifies 20 prompts with lowest real-world signal in past 90 days (via Stage 3 scan)
- Generates 20 replacements (Stages 1–5 for refresh set)
- Outputs `prompts/{category}/v{N+1}.json` (CLOSED)
- Logs swap decisions to `prompts/{category}/rotation_log.md` (CLOSED)

---

**Status:** v0.1 — initial scaffold. Implementation in progress as part of Citee Index pilot phase (May–August 2026).