# Prompt Curation Pipeline > Multi-stage pipeline for curating production prompts per category. Translates the 6-stage methodology process from `prompts/README.md` into runnable code. --- ## Pipeline overview ``` 1_persona_generator.py → data/{category}/personas.json 2_prompt_brainstormer.py → data/{category}/raw_prompts.json 3_reality_checker.py → data/{category}/validated_prompts.json 4_validation_agents.py → data/{category}/critic_review.json 5_pilot_test_runner.py → data/{category}/pilot_test_results.json 6_human_review_export.py → data/{category}/for_human_review.csv 7_finalize.py → prompts/{category}/v{N}.json (CLOSED) ``` Each stage is idempotent and can be re-run with cached intermediate outputs. ## Tech stack - **Python 3.12+** - **Anthropic SDK** (`anthropic>=0.50.0`) — Claude for persona generation, brainstorming, critic agents - **OpenAI SDK** (`openai>=1.50.0`) — GPT-4o-search for pilot test runs - **Google Generative AI** (`google-generativeai>=0.8.0`) — Gemini for pilot test runs - **httpx** for Perplexity API - **pandas** for CSV export to human reviewer - **pytrends** for Google Trends API (free, unofficial) - **praw** for Reddit search (requires Reddit OAuth app) ## Usage ```bash # 1. Set up environment variables (see .env.example) cp .env.example .env # Edit .env with API keys # 2. Run pipeline for a category python pipeline.py --category swiece-sojowe-pl # Or run individual stages python 1_persona_generator.py --category swiece-sojowe-pl python 2_prompt_brainstormer.py --category swiece-sojowe-pl # ... etc. # 3. After Stage 6, review CSV manually + approve in human_review tool python 6_human_review_export.py --category swiece-sojowe-pl # Open data/{category}/for_human_review.csv in spreadsheet # Mark approved/rejected/edited # Save back as for_human_review_decided.csv # 4. Finalize python 7_finalize.py --category swiece-sojowe-pl # Outputs: prompts/{category}/v1.json (gitignored, closed) ``` ## Cost per category (estimated) | Stage | API used | Cost | |---|---|---| | 1 — Persona Generator | Claude Sonnet | ~$0.50 | | 2 — Prompt Brainstormer | Claude Sonnet | ~$1.50 | | 3 — Reality Checker | Free APIs (Trends, Reddit, Quora) | $0 | | 4 — Validation Agents (3 critics) | Claude Sonnet × 3 | ~$3 | | 5 — Pilot Test Runner (10 prompts × 3 models) | GPT-4o + Perplexity + Gemini | ~$5 | | 6 — Human Review Export | (no API) | $0 | | 7 — Finalize | (no API) | $0 | | **TOTAL** | | **~$10** | For 11 pilot categories: ~$110. ## Configuration See `config.py` for tunable parameters per stage: - Number of personas (default: 7) - Prompts per persona (default: 30) - Type distribution targets (40/25/20/15/10 weights → buying/comparison/specific/info/brand-direct) - Pilot sample size (default: 10) - Critic agent thresholds (flagged-by-N agents → remove) ## Quarterly rotation mode ```bash python pipeline.py --category swiece-sojowe-pl --mode rotation ``` In rotation mode: - Reads existing `prompts/{category}/v{N}.json` - Identifies 20 prompts with lowest real-world signal in past 90 days (via Stage 3 scan) - Generates 20 replacements (Stages 1–5 for refresh set) - Outputs `prompts/{category}/v{N+1}.json` (CLOSED) - Logs swap decisions to `prompts/{category}/rotation_log.md` (CLOSED) --- **Status:** v0.1 — initial scaffold. Implementation in progress as part of Citee Index pilot phase (May–August 2026).