citee-methodology/tools/prompt_curation/README.md
Jacek Kubas 03a397343e Faza 1: brand catalog (świece sojowe PL) + prompt curation pipeline
DATA — Public reference datasets for methodology:
- data/README.md: schema + format definitions for brand catalogs
- data/swiece-sojowe-pl/brand_catalog.json: 35 tracked brands (33 manufacturers + 2 importers) + 5 excluded marketplaces/resellers
- data/swiece-sojowe-pl/brand_catalog.md: human-readable companion
- data/swiece-sojowe-pl/market_metadata.json: GMV estimate, personas, seasonality, expected dynamics

TOOLS — 6-stage prompt curation pipeline (Python 3.12+):
- tools/prompt_curation/README.md: process documentation + cost estimates
- tools/prompt_curation/config.py: tunable parameters per stage
- tools/prompt_curation/.env.example: required API keys template
- tools/prompt_curation/requirements.txt: dependencies
- tools/prompt_curation/1_persona_generator.py: Claude generates 7 buyer personas
- tools/prompt_curation/2_prompt_brainstormer.py: per persona × 30 prompts in voice
- tools/prompt_curation/3_reality_checker.py: Google Trends + Reddit cross-check
- tools/prompt_curation/4_validation_agents.py: 3 critic agents async (real_buyer/methodology/exploit_hunter)
- tools/prompt_curation/5_pilot_test_runner.py: sample × 3 LLM models pre-flight
- tools/prompt_curation/6_human_review_export.py: CSV export for founder approval
- tools/prompt_curation/7_finalize.py: post-approval → closed prompts/{cat}/v{N}.json
- tools/prompt_curation/pipeline.py: orchestrator (stages 1–6, then human review, then 7)

GITIGNORE — Fixed .env.* exclusion to allow .env.example.

This commit completes Faza 1. Stages outputs (data/{cat}/personas.json,
raw_prompts.json, validated_prompts.json, critic_review.json, pilot_test_results.json,
for_human_review.csv) are runtime artifacts — public when committed, derived from
public methodology + public brand catalog. Final approved prompt strings in
prompts/{cat}/v{N}.json remain CLOSED (gitignored, anti-Goodhart's Law).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 18:40:12 +02:00

97 lines
3.4 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Prompt Curation Pipeline
> Multi-stage pipeline for curating production prompts per category. Translates the 6-stage methodology process from `prompts/README.md` into runnable code.
---
## Pipeline overview
```
1_persona_generator.py → data/{category}/personas.json
2_prompt_brainstormer.py → data/{category}/raw_prompts.json
3_reality_checker.py → data/{category}/validated_prompts.json
4_validation_agents.py → data/{category}/critic_review.json
5_pilot_test_runner.py → data/{category}/pilot_test_results.json
6_human_review_export.py → data/{category}/for_human_review.csv
7_finalize.py → prompts/{category}/v{N}.json (CLOSED)
```
Each stage is idempotent and can be re-run with cached intermediate outputs.
## Tech stack
- **Python 3.12+**
- **Anthropic SDK** (`anthropic>=0.50.0`) — Claude for persona generation, brainstorming, critic agents
- **OpenAI SDK** (`openai>=1.50.0`) — GPT-4o-search for pilot test runs
- **Google Generative AI** (`google-generativeai>=0.8.0`) — Gemini for pilot test runs
- **httpx** for Perplexity API
- **pandas** for CSV export to human reviewer
- **pytrends** for Google Trends API (free, unofficial)
- **praw** for Reddit search (requires Reddit OAuth app)
## Usage
```bash
# 1. Set up environment variables (see .env.example)
cp .env.example .env
# Edit .env with API keys
# 2. Run pipeline for a category
python pipeline.py --category swiece-sojowe-pl
# Or run individual stages
python 1_persona_generator.py --category swiece-sojowe-pl
python 2_prompt_brainstormer.py --category swiece-sojowe-pl
# ... etc.
# 3. After Stage 6, review CSV manually + approve in human_review tool
python 6_human_review_export.py --category swiece-sojowe-pl
# Open data/{category}/for_human_review.csv in spreadsheet
# Mark approved/rejected/edited
# Save back as for_human_review_decided.csv
# 4. Finalize
python 7_finalize.py --category swiece-sojowe-pl
# Outputs: prompts/{category}/v1.json (gitignored, closed)
```
## Cost per category (estimated)
| Stage | API used | Cost |
|---|---|---|
| 1 — Persona Generator | Claude Sonnet | ~$0.50 |
| 2 — Prompt Brainstormer | Claude Sonnet | ~$1.50 |
| 3 — Reality Checker | Free APIs (Trends, Reddit, Quora) | $0 |
| 4 — Validation Agents (3 critics) | Claude Sonnet × 3 | ~$3 |
| 5 — Pilot Test Runner (10 prompts × 3 models) | GPT-4o + Perplexity + Gemini | ~$5 |
| 6 — Human Review Export | (no API) | $0 |
| 7 — Finalize | (no API) | $0 |
| **TOTAL** | | **~$10** |
For 11 pilot categories: ~$110.
## Configuration
See `config.py` for tunable parameters per stage:
- Number of personas (default: 7)
- Prompts per persona (default: 30)
- Type distribution targets (40/25/20/15/10 weights → buying/comparison/specific/info/brand-direct)
- Pilot sample size (default: 10)
- Critic agent thresholds (flagged-by-N agents → remove)
## Quarterly rotation mode
```bash
python pipeline.py --category swiece-sojowe-pl --mode rotation
```
In rotation mode:
- Reads existing `prompts/{category}/v{N}.json`
- Identifies 20 prompts with lowest real-world signal in past 90 days (via Stage 3 scan)
- Generates 20 replacements (Stages 15 for refresh set)
- Outputs `prompts/{category}/v{N+1}.json` (CLOSED)
- Logs swap decisions to `prompts/{category}/rotation_log.md` (CLOSED)
---
**Status:** v0.1 — initial scaffold. Implementation in progress as part of Citee Index pilot phase (MayAugust 2026).