Add context builder, agent runner, and pipeline executor
core/context_builder.py: build_context() — assembles role-specific context from DB. PM gets everything; debugger gets gotchas/workarounds; reviewer gets conventions only; tester gets minimal context; security gets security-category decisions. format_prompt() — injects context into role templates. agents/runner.py: run_agent() — launches claude CLI as subprocess with role prompt. run_pipeline() — executes multi-step pipelines sequentially, chains output between steps, logs to agent_logs, creates/updates pipeline records, handles failures gracefully. agents/specialists.yaml — 8 roles with tools, permissions, context rules. agents/prompts/pm.md — PM prompt for task decomposition. agents/prompts/security.md — security audit prompt (OWASP, auth, secrets). CLI: kin run <task_id> [--dry-run] PM decomposes → shows pipeline → executes with confirmation. 31 new tests (15 context_builder, 11 runner, 5 JSON parsing). 92 total, all passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
86e5b8febf
commit
fabae74c19
8 changed files with 1207 additions and 0 deletions
58
agents/prompts/pm.md
Normal file
58
agents/prompts/pm.md
Normal file
|
|
@ -0,0 +1,58 @@
|
|||
You are a Project Manager for the Kin multi-agent orchestrator.
|
||||
|
||||
Your job: decompose a task into a pipeline of specialist steps.
|
||||
|
||||
## Input
|
||||
|
||||
You receive:
|
||||
- PROJECT: id, name, tech stack
|
||||
- TASK: id, title, brief
|
||||
- DECISIONS: known issues, gotchas, workarounds for this project
|
||||
- MODULES: project module map
|
||||
- ACTIVE TASKS: currently in-progress tasks (avoid conflicts)
|
||||
- AVAILABLE SPECIALISTS: roles you can assign
|
||||
- ROUTE TEMPLATES: common pipeline patterns
|
||||
|
||||
## Your responsibilities
|
||||
|
||||
1. Analyze the task and determine what type of work is needed
|
||||
2. Select the right specialists from the available pool
|
||||
3. Build an ordered pipeline with dependencies
|
||||
4. Include relevant context hints for each specialist
|
||||
5. Reference known decisions that are relevant to this task
|
||||
|
||||
## Rules
|
||||
|
||||
- Keep pipelines SHORT. 2-4 steps for most tasks.
|
||||
- Always end with a tester or reviewer step for quality.
|
||||
- For debug tasks: debugger first to find the root cause, then fix, then verify.
|
||||
- For features: architect first (if complex), then developer, then test + review.
|
||||
- Don't assign specialists who aren't needed.
|
||||
- If a task is blocked or unclear, say so — don't guess.
|
||||
|
||||
## Output format
|
||||
|
||||
Return ONLY valid JSON (no markdown, no explanation):
|
||||
|
||||
```json
|
||||
{
|
||||
"analysis": "Brief analysis of what needs to be done",
|
||||
"pipeline": [
|
||||
{
|
||||
"role": "debugger",
|
||||
"model": "sonnet",
|
||||
"brief": "What this specialist should do",
|
||||
"module": "search",
|
||||
"relevant_decisions": [1, 5, 12]
|
||||
},
|
||||
{
|
||||
"role": "tester",
|
||||
"model": "sonnet",
|
||||
"depends_on": "debugger",
|
||||
"brief": "Write regression test for the fix"
|
||||
}
|
||||
],
|
||||
"estimated_steps": 2,
|
||||
"route_type": "debug"
|
||||
}
|
||||
```
|
||||
73
agents/prompts/security.md
Normal file
73
agents/prompts/security.md
Normal file
|
|
@ -0,0 +1,73 @@
|
|||
You are a Security Engineer performing a security audit.
|
||||
|
||||
## Scope
|
||||
|
||||
Analyze the codebase for security vulnerabilities. Focus on:
|
||||
|
||||
1. **Authentication & Authorization**
|
||||
- Missing auth on endpoints
|
||||
- Broken access control
|
||||
- Session management issues
|
||||
- JWT/token handling
|
||||
|
||||
2. **OWASP Top 10**
|
||||
- Injection (SQL, NoSQL, command, XSS)
|
||||
- Broken authentication
|
||||
- Sensitive data exposure
|
||||
- Security misconfiguration
|
||||
- SSRF, CSRF
|
||||
|
||||
3. **Secrets & Credentials**
|
||||
- Hardcoded secrets, API keys, passwords
|
||||
- Secrets in git history
|
||||
- Unencrypted sensitive data
|
||||
- .env files exposed
|
||||
|
||||
4. **Input Validation**
|
||||
- Missing sanitization
|
||||
- File upload vulnerabilities
|
||||
- Path traversal
|
||||
- Unsafe deserialization
|
||||
|
||||
5. **Dependencies**
|
||||
- Known CVEs in packages
|
||||
- Outdated dependencies
|
||||
- Supply chain risks
|
||||
|
||||
## Rules
|
||||
|
||||
- Read code carefully, don't skim
|
||||
- Check EVERY endpoint for auth
|
||||
- Check EVERY user input for sanitization
|
||||
- Severity levels: CRITICAL, HIGH, MEDIUM, LOW, INFO
|
||||
- For each finding: describe the vulnerability, show the code, suggest a fix
|
||||
- Don't fix code yourself — only report
|
||||
|
||||
## Output format
|
||||
|
||||
Return ONLY valid JSON:
|
||||
|
||||
```json
|
||||
{
|
||||
"summary": "Brief overall assessment",
|
||||
"findings": [
|
||||
{
|
||||
"severity": "HIGH",
|
||||
"category": "missing_auth",
|
||||
"title": "Admin endpoint without authentication",
|
||||
"file": "src/routes/admin.js",
|
||||
"line": 42,
|
||||
"description": "The /api/admin/users endpoint has no auth middleware",
|
||||
"recommendation": "Add requireAuth middleware before the handler",
|
||||
"owasp": "A01:2021 Broken Access Control"
|
||||
}
|
||||
],
|
||||
"stats": {
|
||||
"files_reviewed": 15,
|
||||
"critical": 0,
|
||||
"high": 2,
|
||||
"medium": 3,
|
||||
"low": 1
|
||||
}
|
||||
}
|
||||
```
|
||||
311
agents/runner.py
Normal file
311
agents/runner.py
Normal file
|
|
@ -0,0 +1,311 @@
|
|||
"""
|
||||
Kin agent runner — launches Claude Code as subprocess with role-specific context.
|
||||
Each agent = separate process with isolated context.
|
||||
"""
|
||||
|
||||
import json
|
||||
import sqlite3
|
||||
import subprocess
|
||||
import time
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
from core import models
|
||||
from core.context_builder import build_context, format_prompt
|
||||
|
||||
|
||||
def run_agent(
|
||||
conn: sqlite3.Connection,
|
||||
role: str,
|
||||
task_id: str,
|
||||
project_id: str,
|
||||
model: str = "sonnet",
|
||||
previous_output: str | None = None,
|
||||
brief_override: str | None = None,
|
||||
dry_run: bool = False,
|
||||
) -> dict:
|
||||
"""Run a single Claude Code agent as a subprocess.
|
||||
|
||||
1. Build context from DB
|
||||
2. Format prompt with role template
|
||||
3. Run: claude -p "{prompt}" --output-format json
|
||||
4. Log result to agent_logs
|
||||
5. Return {success, output, tokens_used, duration_seconds, cost_usd}
|
||||
"""
|
||||
# Build context
|
||||
ctx = build_context(conn, task_id, role, project_id)
|
||||
if previous_output:
|
||||
ctx["previous_output"] = previous_output
|
||||
if brief_override:
|
||||
if ctx.get("task"):
|
||||
ctx["task"]["brief"] = brief_override
|
||||
|
||||
prompt = format_prompt(ctx, role)
|
||||
|
||||
if dry_run:
|
||||
return {
|
||||
"success": True,
|
||||
"output": None,
|
||||
"prompt": prompt,
|
||||
"role": role,
|
||||
"model": model,
|
||||
"dry_run": True,
|
||||
}
|
||||
|
||||
# Determine working directory
|
||||
project = models.get_project(conn, project_id)
|
||||
working_dir = None
|
||||
if project and role in ("debugger", "frontend_dev", "backend_dev", "tester", "security"):
|
||||
project_path = Path(project["path"]).expanduser()
|
||||
if project_path.is_dir():
|
||||
working_dir = str(project_path)
|
||||
|
||||
# Run claude subprocess
|
||||
start = time.monotonic()
|
||||
result = _run_claude(prompt, model=model, working_dir=working_dir)
|
||||
duration = int(time.monotonic() - start)
|
||||
|
||||
# Parse output
|
||||
output_text = result.get("output", "")
|
||||
success = result["returncode"] == 0
|
||||
parsed_output = _try_parse_json(output_text)
|
||||
|
||||
# Log to DB
|
||||
models.log_agent_run(
|
||||
conn,
|
||||
project_id=project_id,
|
||||
task_id=task_id,
|
||||
agent_role=role,
|
||||
action="execute",
|
||||
input_summary=f"task={task_id}, model={model}",
|
||||
output_summary=output_text[:500] if output_text else None,
|
||||
tokens_used=result.get("tokens_used"),
|
||||
model=model,
|
||||
cost_usd=result.get("cost_usd"),
|
||||
success=success,
|
||||
error_message=result.get("error") if not success else None,
|
||||
duration_seconds=duration,
|
||||
)
|
||||
|
||||
return {
|
||||
"success": success,
|
||||
"output": parsed_output if parsed_output else output_text,
|
||||
"raw_output": output_text,
|
||||
"role": role,
|
||||
"model": model,
|
||||
"duration_seconds": duration,
|
||||
"tokens_used": result.get("tokens_used"),
|
||||
"cost_usd": result.get("cost_usd"),
|
||||
}
|
||||
|
||||
|
||||
def _run_claude(
|
||||
prompt: str,
|
||||
model: str = "sonnet",
|
||||
working_dir: str | None = None,
|
||||
) -> dict:
|
||||
"""Execute claude CLI as subprocess. Returns dict with output, returncode, etc."""
|
||||
cmd = [
|
||||
"claude",
|
||||
"-p", prompt,
|
||||
"--output-format", "json",
|
||||
"--model", model,
|
||||
]
|
||||
|
||||
try:
|
||||
proc = subprocess.run(
|
||||
cmd,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=600, # 10 min max
|
||||
cwd=working_dir,
|
||||
)
|
||||
except FileNotFoundError:
|
||||
return {
|
||||
"output": "",
|
||||
"error": "claude CLI not found in PATH",
|
||||
"returncode": 127,
|
||||
}
|
||||
except subprocess.TimeoutExpired:
|
||||
return {
|
||||
"output": "",
|
||||
"error": "Agent timed out after 600s",
|
||||
"returncode": 124,
|
||||
}
|
||||
|
||||
# Try to extract structured data from JSON output
|
||||
output = proc.stdout or ""
|
||||
result: dict[str, Any] = {
|
||||
"output": output,
|
||||
"error": proc.stderr if proc.returncode != 0 else None,
|
||||
"returncode": proc.returncode,
|
||||
}
|
||||
|
||||
# Parse JSON output from claude --output-format json
|
||||
parsed = _try_parse_json(output)
|
||||
if isinstance(parsed, dict):
|
||||
result["tokens_used"] = parsed.get("usage", {}).get("total_tokens")
|
||||
result["cost_usd"] = parsed.get("cost_usd")
|
||||
# The actual content is usually in result or content
|
||||
if "result" in parsed:
|
||||
result["output"] = parsed["result"]
|
||||
elif "content" in parsed:
|
||||
result["output"] = parsed["content"]
|
||||
|
||||
return result
|
||||
|
||||
|
||||
def _try_parse_json(text: str) -> Any:
|
||||
"""Try to parse JSON from text. Returns parsed obj or None."""
|
||||
text = text.strip()
|
||||
if not text:
|
||||
return None
|
||||
|
||||
# Direct parse
|
||||
try:
|
||||
return json.loads(text)
|
||||
except json.JSONDecodeError:
|
||||
pass
|
||||
|
||||
# Try to find JSON block in markdown code fences
|
||||
import re
|
||||
m = re.search(r"```(?:json)?\s*\n(.*?)\n```", text, re.DOTALL)
|
||||
if m:
|
||||
try:
|
||||
return json.loads(m.group(1))
|
||||
except json.JSONDecodeError:
|
||||
pass
|
||||
|
||||
# Try to find first { ... } or [ ... ]
|
||||
for start_char, end_char in [("{", "}"), ("[", "]")]:
|
||||
start = text.find(start_char)
|
||||
if start >= 0:
|
||||
# Find matching close
|
||||
depth = 0
|
||||
for i in range(start, len(text)):
|
||||
if text[i] == start_char:
|
||||
depth += 1
|
||||
elif text[i] == end_char:
|
||||
depth -= 1
|
||||
if depth == 0:
|
||||
try:
|
||||
return json.loads(text[start:i + 1])
|
||||
except json.JSONDecodeError:
|
||||
break
|
||||
return None
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Pipeline executor
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def run_pipeline(
|
||||
conn: sqlite3.Connection,
|
||||
task_id: str,
|
||||
steps: list[dict],
|
||||
dry_run: bool = False,
|
||||
) -> dict:
|
||||
"""Execute a multi-step pipeline of agents.
|
||||
|
||||
steps = [
|
||||
{"role": "debugger", "model": "opus", "brief": "..."},
|
||||
{"role": "tester", "depends_on": "debugger", "brief": "..."},
|
||||
]
|
||||
|
||||
Returns {success, steps_completed, total_cost, total_tokens, total_duration, results}
|
||||
"""
|
||||
task = models.get_task(conn, task_id)
|
||||
if not task:
|
||||
return {"success": False, "error": f"Task '{task_id}' not found"}
|
||||
|
||||
project_id = task["project_id"]
|
||||
|
||||
# Determine route type from steps or task brief
|
||||
route_type = "custom"
|
||||
if task.get("brief") and isinstance(task["brief"], dict):
|
||||
route_type = task["brief"].get("route_type", "custom") or "custom"
|
||||
|
||||
# Create pipeline in DB
|
||||
pipeline = None
|
||||
if not dry_run:
|
||||
pipeline = models.create_pipeline(
|
||||
conn, task_id, project_id, route_type, steps,
|
||||
)
|
||||
models.update_task(conn, task_id, status="in_progress")
|
||||
|
||||
results = []
|
||||
total_cost = 0.0
|
||||
total_tokens = 0
|
||||
total_duration = 0
|
||||
previous_output = None
|
||||
|
||||
for i, step in enumerate(steps):
|
||||
role = step["role"]
|
||||
model = step.get("model", "sonnet")
|
||||
brief = step.get("brief")
|
||||
|
||||
result = run_agent(
|
||||
conn, role, task_id, project_id,
|
||||
model=model,
|
||||
previous_output=previous_output,
|
||||
brief_override=brief,
|
||||
dry_run=dry_run,
|
||||
)
|
||||
results.append(result)
|
||||
|
||||
if dry_run:
|
||||
continue
|
||||
|
||||
# Accumulate stats
|
||||
total_cost += result.get("cost_usd") or 0
|
||||
total_tokens += result.get("tokens_used") or 0
|
||||
total_duration += result.get("duration_seconds") or 0
|
||||
|
||||
if not result["success"]:
|
||||
# Pipeline failed — stop and mark as failed
|
||||
if pipeline:
|
||||
models.update_pipeline(
|
||||
conn, pipeline["id"],
|
||||
status="failed",
|
||||
total_cost_usd=total_cost,
|
||||
total_tokens=total_tokens,
|
||||
total_duration_seconds=total_duration,
|
||||
)
|
||||
models.update_task(conn, task_id, status="blocked")
|
||||
return {
|
||||
"success": False,
|
||||
"error": f"Step {i+1}/{len(steps)} ({role}) failed",
|
||||
"steps_completed": i,
|
||||
"results": results,
|
||||
"total_cost_usd": total_cost,
|
||||
"total_tokens": total_tokens,
|
||||
"total_duration_seconds": total_duration,
|
||||
"pipeline_id": pipeline["id"] if pipeline else None,
|
||||
}
|
||||
|
||||
# Chain output to next step
|
||||
previous_output = result.get("raw_output") or result.get("output")
|
||||
if isinstance(previous_output, (dict, list)):
|
||||
previous_output = json.dumps(previous_output, ensure_ascii=False)
|
||||
|
||||
# Pipeline completed
|
||||
if pipeline and not dry_run:
|
||||
models.update_pipeline(
|
||||
conn, pipeline["id"],
|
||||
status="completed",
|
||||
total_cost_usd=total_cost,
|
||||
total_tokens=total_tokens,
|
||||
total_duration_seconds=total_duration,
|
||||
)
|
||||
models.update_task(conn, task_id, status="review")
|
||||
|
||||
return {
|
||||
"success": True,
|
||||
"steps_completed": len(steps),
|
||||
"results": results,
|
||||
"total_cost_usd": total_cost,
|
||||
"total_tokens": total_tokens,
|
||||
"total_duration_seconds": total_duration,
|
||||
"pipeline_id": pipeline["id"] if pipeline else None,
|
||||
"dry_run": dry_run,
|
||||
}
|
||||
104
agents/specialists.yaml
Normal file
104
agents/specialists.yaml
Normal file
|
|
@ -0,0 +1,104 @@
|
|||
# Kin specialist pool — roles available for pipeline construction.
|
||||
# PM selects from this pool based on task type.
|
||||
|
||||
specialists:
|
||||
pm:
|
||||
name: "Project Manager"
|
||||
model: sonnet
|
||||
tools: [Read, Grep, Glob]
|
||||
description: "Decomposes tasks, selects specialists, builds pipelines"
|
||||
permissions: read_only
|
||||
context_rules:
|
||||
decisions: all
|
||||
modules: all
|
||||
|
||||
architect:
|
||||
name: "Software Architect"
|
||||
model: sonnet
|
||||
tools: [Read, Grep, Glob]
|
||||
description: "Designs solutions, reviews structure, writes specs"
|
||||
permissions: read_only
|
||||
context_rules:
|
||||
decisions: all
|
||||
modules: all
|
||||
|
||||
debugger:
|
||||
name: "Debugger"
|
||||
model: sonnet
|
||||
tools: [Read, Grep, Glob, Bash]
|
||||
description: "Finds root causes, reads logs, traces execution"
|
||||
permissions: read_bash
|
||||
working_dir: project
|
||||
context_rules:
|
||||
decisions: [gotcha, workaround]
|
||||
|
||||
frontend_dev:
|
||||
name: "Frontend Developer"
|
||||
model: sonnet
|
||||
tools: [Read, Write, Edit, Bash, Glob, Grep]
|
||||
description: "Implements UI: Vue, CSS, components, composables"
|
||||
permissions: full
|
||||
working_dir: project
|
||||
context_rules:
|
||||
decisions: [gotcha, workaround, convention]
|
||||
|
||||
backend_dev:
|
||||
name: "Backend Developer"
|
||||
model: sonnet
|
||||
tools: [Read, Write, Edit, Bash, Glob, Grep]
|
||||
description: "Implements API, services, database, business logic"
|
||||
permissions: full
|
||||
working_dir: project
|
||||
context_rules:
|
||||
decisions: [gotcha, workaround, convention]
|
||||
|
||||
tester:
|
||||
name: "Tester"
|
||||
model: sonnet
|
||||
tools: [Read, Write, Bash, Glob, Grep]
|
||||
description: "Writes and runs tests, verifies fixes"
|
||||
permissions: full
|
||||
working_dir: project
|
||||
context_rules:
|
||||
decisions: []
|
||||
|
||||
reviewer:
|
||||
name: "Code Reviewer"
|
||||
model: sonnet
|
||||
tools: [Read, Grep, Glob]
|
||||
description: "Reviews code for quality, conventions, bugs"
|
||||
permissions: read_only
|
||||
context_rules:
|
||||
decisions: [convention]
|
||||
|
||||
security:
|
||||
name: "Security Engineer"
|
||||
model: sonnet
|
||||
tools: [Read, Grep, Glob, Bash]
|
||||
description: "OWASP audit, auth checks, secrets scan, vulnerability analysis"
|
||||
permissions: read_bash
|
||||
working_dir: project
|
||||
context_rules:
|
||||
decisions_category: security
|
||||
|
||||
# Route templates — PM uses these to build pipelines
|
||||
routes:
|
||||
debug:
|
||||
steps: [debugger, tester, frontend_dev, tester]
|
||||
description: "Find bug → verify → fix → verify fix"
|
||||
|
||||
feature:
|
||||
steps: [architect, frontend_dev, tester, reviewer]
|
||||
description: "Design → implement → test → review"
|
||||
|
||||
refactor:
|
||||
steps: [architect, frontend_dev, tester, reviewer]
|
||||
description: "Plan refactor → implement → test → review"
|
||||
|
||||
hotfix:
|
||||
steps: [debugger, frontend_dev, tester]
|
||||
description: "Find → fix → verify (fast track)"
|
||||
|
||||
security_audit:
|
||||
steps: [security, architect]
|
||||
description: "Audit → remediation plan"
|
||||
82
cli/main.py
82
cli/main.py
|
|
@ -408,6 +408,88 @@ def cost(ctx, period):
|
|||
click.echo(f"\nTotal: ${total:.4f}")
|
||||
|
||||
|
||||
# ===========================================================================
|
||||
# run
|
||||
# ===========================================================================
|
||||
|
||||
@cli.command("run")
|
||||
@click.argument("task_id")
|
||||
@click.option("--dry-run", is_flag=True, help="Show pipeline plan without executing")
|
||||
@click.pass_context
|
||||
def run_task(ctx, task_id, dry_run):
|
||||
"""Run a task through the agent pipeline.
|
||||
|
||||
PM decomposes the task into specialist steps, then the pipeline executes.
|
||||
With --dry-run, shows the plan without running agents.
|
||||
"""
|
||||
from agents.runner import run_agent, run_pipeline
|
||||
|
||||
conn = ctx.obj["conn"]
|
||||
task = models.get_task(conn, task_id)
|
||||
if not task:
|
||||
click.echo(f"Task '{task_id}' not found.", err=True)
|
||||
raise SystemExit(1)
|
||||
|
||||
project_id = task["project_id"]
|
||||
click.echo(f"Task: {task['id']} — {task['title']}")
|
||||
|
||||
# Step 1: PM decomposes
|
||||
click.echo("Running PM to decompose task...")
|
||||
pm_result = run_agent(
|
||||
conn, "pm", task_id, project_id,
|
||||
model="sonnet", dry_run=dry_run,
|
||||
)
|
||||
|
||||
if dry_run:
|
||||
click.echo("\n--- PM Prompt (dry-run) ---")
|
||||
click.echo(pm_result.get("prompt", "")[:2000])
|
||||
click.echo("\n(Dry-run: PM would produce a pipeline JSON)")
|
||||
return
|
||||
|
||||
if not pm_result["success"]:
|
||||
click.echo(f"PM failed: {pm_result.get('output', 'unknown error')}", err=True)
|
||||
raise SystemExit(1)
|
||||
|
||||
# Parse PM output for pipeline
|
||||
output = pm_result.get("output")
|
||||
if isinstance(output, str):
|
||||
try:
|
||||
output = json.loads(output)
|
||||
except json.JSONDecodeError:
|
||||
click.echo(f"PM returned non-JSON output:\n{output[:500]}", err=True)
|
||||
raise SystemExit(1)
|
||||
|
||||
if not isinstance(output, dict) or "pipeline" not in output:
|
||||
click.echo(f"PM output missing 'pipeline' key:\n{json.dumps(output, indent=2)[:500]}", err=True)
|
||||
raise SystemExit(1)
|
||||
|
||||
pipeline_steps = output["pipeline"]
|
||||
analysis = output.get("analysis", "")
|
||||
|
||||
click.echo(f"\nAnalysis: {analysis}")
|
||||
click.echo(f"Pipeline ({len(pipeline_steps)} steps):")
|
||||
for i, step in enumerate(pipeline_steps, 1):
|
||||
click.echo(f" {i}. {step['role']} ({step.get('model', 'sonnet')}): {step.get('brief', '')}")
|
||||
|
||||
if not click.confirm("\nExecute pipeline?"):
|
||||
click.echo("Aborted.")
|
||||
return
|
||||
|
||||
# Step 2: Execute pipeline
|
||||
click.echo("\nExecuting pipeline...")
|
||||
result = run_pipeline(conn, task_id, pipeline_steps)
|
||||
|
||||
if result["success"]:
|
||||
click.echo(f"\nPipeline completed: {result['steps_completed']} steps")
|
||||
else:
|
||||
click.echo(f"\nPipeline failed at step: {result.get('error', 'unknown')}", err=True)
|
||||
|
||||
if result.get("total_cost_usd"):
|
||||
click.echo(f"Cost: ${result['total_cost_usd']:.4f}")
|
||||
if result.get("total_duration_seconds"):
|
||||
click.echo(f"Duration: {result['total_duration_seconds']}s")
|
||||
|
||||
|
||||
# ===========================================================================
|
||||
# bootstrap
|
||||
# ===========================================================================
|
||||
|
|
|
|||
212
core/context_builder.py
Normal file
212
core/context_builder.py
Normal file
|
|
@ -0,0 +1,212 @@
|
|||
"""
|
||||
Kin context builder — assembles role-specific context from DB for agent prompts.
|
||||
Each role gets only the information it needs, keeping prompts focused.
|
||||
"""
|
||||
|
||||
import json
|
||||
import sqlite3
|
||||
from pathlib import Path
|
||||
|
||||
from core import models
|
||||
|
||||
PROMPTS_DIR = Path(__file__).parent.parent / "agents" / "prompts"
|
||||
SPECIALISTS_PATH = Path(__file__).parent.parent / "agents" / "specialists.yaml"
|
||||
|
||||
|
||||
def _load_specialists() -> dict:
|
||||
"""Load specialists.yaml (lazy, no pyyaml dependency — simple parser)."""
|
||||
path = SPECIALISTS_PATH
|
||||
if not path.exists():
|
||||
return {}
|
||||
import yaml
|
||||
return yaml.safe_load(path.read_text())
|
||||
|
||||
|
||||
def build_context(
|
||||
conn: sqlite3.Connection,
|
||||
task_id: str,
|
||||
role: str,
|
||||
project_id: str,
|
||||
) -> dict:
|
||||
"""Build role-specific context from DB.
|
||||
|
||||
Returns a dict with keys: task, project, and role-specific data.
|
||||
"""
|
||||
task = models.get_task(conn, task_id)
|
||||
project = models.get_project(conn, project_id)
|
||||
|
||||
ctx = {
|
||||
"task": _slim_task(task) if task else None,
|
||||
"project": _slim_project(project) if project else None,
|
||||
"role": role,
|
||||
}
|
||||
|
||||
if role == "pm":
|
||||
ctx["modules"] = models.get_modules(conn, project_id)
|
||||
ctx["decisions"] = models.get_decisions(conn, project_id)
|
||||
ctx["active_tasks"] = models.list_tasks(conn, project_id=project_id, status="in_progress")
|
||||
try:
|
||||
specs = _load_specialists()
|
||||
ctx["available_specialists"] = list(specs.get("specialists", {}).keys())
|
||||
ctx["routes"] = specs.get("routes", {})
|
||||
except Exception:
|
||||
ctx["available_specialists"] = []
|
||||
ctx["routes"] = {}
|
||||
|
||||
elif role == "architect":
|
||||
ctx["modules"] = models.get_modules(conn, project_id)
|
||||
ctx["decisions"] = models.get_decisions(conn, project_id)
|
||||
|
||||
elif role == "debugger":
|
||||
ctx["decisions"] = models.get_decisions(
|
||||
conn, project_id, types=["gotcha", "workaround"],
|
||||
)
|
||||
ctx["module_hint"] = _extract_module_hint(task)
|
||||
|
||||
elif role in ("frontend_dev", "backend_dev"):
|
||||
ctx["decisions"] = models.get_decisions(
|
||||
conn, project_id, types=["gotcha", "workaround", "convention"],
|
||||
)
|
||||
|
||||
elif role == "reviewer":
|
||||
ctx["decisions"] = models.get_decisions(
|
||||
conn, project_id, types=["convention"],
|
||||
)
|
||||
|
||||
elif role == "tester":
|
||||
# Minimal context — just the task spec
|
||||
pass
|
||||
|
||||
elif role == "security":
|
||||
ctx["decisions"] = models.get_decisions(
|
||||
conn, project_id, category="security",
|
||||
)
|
||||
|
||||
else:
|
||||
# Unknown role — give decisions as fallback
|
||||
ctx["decisions"] = models.get_decisions(conn, project_id, limit=20)
|
||||
|
||||
return ctx
|
||||
|
||||
|
||||
def _slim_task(task: dict) -> dict:
|
||||
"""Extract only relevant fields from a task for the prompt."""
|
||||
return {
|
||||
"id": task["id"],
|
||||
"title": task["title"],
|
||||
"status": task["status"],
|
||||
"priority": task["priority"],
|
||||
"assigned_role": task.get("assigned_role"),
|
||||
"brief": task.get("brief"),
|
||||
"spec": task.get("spec"),
|
||||
}
|
||||
|
||||
|
||||
def _slim_project(project: dict) -> dict:
|
||||
"""Extract only relevant fields from a project."""
|
||||
return {
|
||||
"id": project["id"],
|
||||
"name": project["name"],
|
||||
"path": project["path"],
|
||||
"tech_stack": project.get("tech_stack"),
|
||||
}
|
||||
|
||||
|
||||
def _extract_module_hint(task: dict | None) -> str | None:
|
||||
"""Try to extract module name from task brief."""
|
||||
if not task:
|
||||
return None
|
||||
brief = task.get("brief")
|
||||
if isinstance(brief, dict):
|
||||
return brief.get("module")
|
||||
return None
|
||||
|
||||
|
||||
def format_prompt(context: dict, role: str, prompt_template: str | None = None) -> str:
|
||||
"""Format a prompt by injecting context into a role template.
|
||||
|
||||
If prompt_template is None, loads from agents/prompts/{role}.md.
|
||||
"""
|
||||
if prompt_template is None:
|
||||
prompt_path = PROMPTS_DIR / f"{role}.md"
|
||||
if prompt_path.exists():
|
||||
prompt_template = prompt_path.read_text()
|
||||
else:
|
||||
prompt_template = f"You are a {role}. Complete the task described below."
|
||||
|
||||
sections = [prompt_template, ""]
|
||||
|
||||
# Project info
|
||||
proj = context.get("project")
|
||||
if proj:
|
||||
sections.append(f"## Project: {proj['id']} — {proj['name']}")
|
||||
if proj.get("tech_stack"):
|
||||
sections.append(f"Tech stack: {', '.join(proj['tech_stack'])}")
|
||||
sections.append(f"Path: {proj['path']}")
|
||||
sections.append("")
|
||||
|
||||
# Task info
|
||||
task = context.get("task")
|
||||
if task:
|
||||
sections.append(f"## Task: {task['id']} — {task['title']}")
|
||||
sections.append(f"Status: {task['status']}, Priority: {task['priority']}")
|
||||
if task.get("brief"):
|
||||
sections.append(f"Brief: {json.dumps(task['brief'], ensure_ascii=False)}")
|
||||
if task.get("spec"):
|
||||
sections.append(f"Spec: {json.dumps(task['spec'], ensure_ascii=False)}")
|
||||
sections.append("")
|
||||
|
||||
# Decisions
|
||||
decisions = context.get("decisions")
|
||||
if decisions:
|
||||
sections.append(f"## Known decisions ({len(decisions)}):")
|
||||
for d in decisions[:30]: # Cap at 30 to avoid token bloat
|
||||
tags = f" [{', '.join(d['tags'])}]" if d.get("tags") else ""
|
||||
sections.append(f"- #{d['id']} [{d['type']}] {d['title']}{tags}")
|
||||
sections.append("")
|
||||
|
||||
# Modules
|
||||
modules = context.get("modules")
|
||||
if modules:
|
||||
sections.append(f"## Modules ({len(modules)}):")
|
||||
for m in modules:
|
||||
sections.append(f"- {m['name']} ({m['type']}) — {m['path']}")
|
||||
sections.append("")
|
||||
|
||||
# Active tasks (PM)
|
||||
active = context.get("active_tasks")
|
||||
if active:
|
||||
sections.append(f"## Active tasks ({len(active)}):")
|
||||
for t in active:
|
||||
sections.append(f"- {t['id']}: {t['title']} [{t['status']}]")
|
||||
sections.append("")
|
||||
|
||||
# Available specialists (PM)
|
||||
specialists = context.get("available_specialists")
|
||||
if specialists:
|
||||
sections.append(f"## Available specialists: {', '.join(specialists)}")
|
||||
sections.append("")
|
||||
|
||||
# Routes (PM)
|
||||
routes = context.get("routes")
|
||||
if routes:
|
||||
sections.append("## Route templates:")
|
||||
for name, route in routes.items():
|
||||
steps = " → ".join(route.get("steps", []))
|
||||
sections.append(f"- {name}: {steps}")
|
||||
sections.append("")
|
||||
|
||||
# Module hint (debugger)
|
||||
hint = context.get("module_hint")
|
||||
if hint:
|
||||
sections.append(f"## Target module: {hint}")
|
||||
sections.append("")
|
||||
|
||||
# Previous step output (pipeline chaining)
|
||||
prev = context.get("previous_output")
|
||||
if prev:
|
||||
sections.append("## Previous step output:")
|
||||
sections.append(prev if isinstance(prev, str) else json.dumps(prev, ensure_ascii=False))
|
||||
sections.append("")
|
||||
|
||||
return "\n".join(sections)
|
||||
133
tests/test_context_builder.py
Normal file
133
tests/test_context_builder.py
Normal file
|
|
@ -0,0 +1,133 @@
|
|||
"""Tests for core/context_builder.py — context assembly per role."""
|
||||
|
||||
import pytest
|
||||
from core.db import init_db
|
||||
from core import models
|
||||
from core.context_builder import build_context, format_prompt
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def conn():
|
||||
c = init_db(":memory:")
|
||||
# Seed project, modules, decisions, tasks
|
||||
models.create_project(c, "vdol", "ВДОЛЬ и ПОПЕРЕК", "~/projects/vdolipoperek",
|
||||
tech_stack=["vue3", "typescript", "nodejs"])
|
||||
models.add_module(c, "vdol", "search", "frontend", "src/search/")
|
||||
models.add_module(c, "vdol", "api", "backend", "src/api/")
|
||||
models.add_decision(c, "vdol", "gotcha", "Safari bug",
|
||||
"position:fixed breaks", category="ui", tags=["ios"])
|
||||
models.add_decision(c, "vdol", "workaround", "API rate limit",
|
||||
"10 req/s max", category="api")
|
||||
models.add_decision(c, "vdol", "convention", "Use WAL mode",
|
||||
"Always use WAL for SQLite", category="architecture")
|
||||
models.add_decision(c, "vdol", "decision", "Auth required",
|
||||
"All endpoints need auth", category="security")
|
||||
models.create_task(c, "VDOL-001", "vdol", "Fix search filters",
|
||||
brief={"module": "search", "route_type": "debug"})
|
||||
models.create_task(c, "VDOL-002", "vdol", "Add payments",
|
||||
status="in_progress")
|
||||
yield c
|
||||
c.close()
|
||||
|
||||
|
||||
class TestBuildContext:
|
||||
def test_pm_gets_everything(self, conn):
|
||||
ctx = build_context(conn, "VDOL-001", "pm", "vdol")
|
||||
assert ctx["task"]["id"] == "VDOL-001"
|
||||
assert ctx["project"]["id"] == "vdol"
|
||||
assert len(ctx["modules"]) == 2
|
||||
assert len(ctx["decisions"]) == 4 # all decisions
|
||||
assert len(ctx["active_tasks"]) == 1 # VDOL-002 in_progress
|
||||
assert "pm" in ctx["available_specialists"]
|
||||
|
||||
def test_architect_gets_all_decisions_and_modules(self, conn):
|
||||
ctx = build_context(conn, "VDOL-001", "architect", "vdol")
|
||||
assert len(ctx["modules"]) == 2
|
||||
assert len(ctx["decisions"]) == 4
|
||||
|
||||
def test_debugger_gets_only_gotcha_workaround(self, conn):
|
||||
ctx = build_context(conn, "VDOL-001", "debugger", "vdol")
|
||||
types = {d["type"] for d in ctx["decisions"]}
|
||||
assert types <= {"gotcha", "workaround"}
|
||||
assert "convention" not in types
|
||||
assert "decision" not in types
|
||||
assert ctx["module_hint"] == "search"
|
||||
|
||||
def test_frontend_dev_gets_gotcha_workaround_convention(self, conn):
|
||||
ctx = build_context(conn, "VDOL-001", "frontend_dev", "vdol")
|
||||
types = {d["type"] for d in ctx["decisions"]}
|
||||
assert "gotcha" in types
|
||||
assert "workaround" in types
|
||||
assert "convention" in types
|
||||
assert "decision" not in types # plain decisions excluded
|
||||
|
||||
def test_backend_dev_same_as_frontend(self, conn):
|
||||
ctx = build_context(conn, "VDOL-001", "backend_dev", "vdol")
|
||||
types = {d["type"] for d in ctx["decisions"]}
|
||||
assert types == {"gotcha", "workaround", "convention"}
|
||||
|
||||
def test_reviewer_gets_only_conventions(self, conn):
|
||||
ctx = build_context(conn, "VDOL-001", "reviewer", "vdol")
|
||||
types = {d["type"] for d in ctx["decisions"]}
|
||||
assert types == {"convention"}
|
||||
|
||||
def test_tester_gets_minimal_context(self, conn):
|
||||
ctx = build_context(conn, "VDOL-001", "tester", "vdol")
|
||||
assert ctx["task"] is not None
|
||||
assert ctx["project"] is not None
|
||||
assert "decisions" not in ctx
|
||||
assert "modules" not in ctx
|
||||
|
||||
def test_security_gets_security_decisions(self, conn):
|
||||
ctx = build_context(conn, "VDOL-001", "security", "vdol")
|
||||
categories = {d.get("category") for d in ctx["decisions"]}
|
||||
assert categories == {"security"}
|
||||
|
||||
def test_unknown_role_gets_fallback(self, conn):
|
||||
ctx = build_context(conn, "VDOL-001", "unknown_role", "vdol")
|
||||
assert "decisions" in ctx
|
||||
assert len(ctx["decisions"]) > 0
|
||||
|
||||
|
||||
class TestFormatPrompt:
|
||||
def test_format_with_template(self, conn):
|
||||
ctx = build_context(conn, "VDOL-001", "debugger", "vdol")
|
||||
prompt = format_prompt(ctx, "debugger", "You are a debugger. Find bugs.")
|
||||
assert "You are a debugger" in prompt
|
||||
assert "VDOL-001" in prompt
|
||||
assert "Fix search filters" in prompt
|
||||
assert "vdol" in prompt
|
||||
assert "vue3" in prompt
|
||||
|
||||
def test_format_includes_decisions(self, conn):
|
||||
ctx = build_context(conn, "VDOL-001", "debugger", "vdol")
|
||||
prompt = format_prompt(ctx, "debugger", "Debug this.")
|
||||
assert "Safari bug" in prompt
|
||||
assert "API rate limit" in prompt
|
||||
# Convention should NOT be here (debugger doesn't get it)
|
||||
assert "WAL mode" not in prompt
|
||||
|
||||
def test_format_pm_includes_specialists(self, conn):
|
||||
ctx = build_context(conn, "VDOL-001", "pm", "vdol")
|
||||
prompt = format_prompt(ctx, "pm", "You are PM.")
|
||||
assert "Available specialists" in prompt
|
||||
assert "debugger" in prompt
|
||||
assert "Active tasks" in prompt
|
||||
assert "VDOL-002" in prompt
|
||||
|
||||
def test_format_with_previous_output(self, conn):
|
||||
ctx = build_context(conn, "VDOL-001", "tester", "vdol")
|
||||
ctx["previous_output"] = "Found race condition in useSearch.ts"
|
||||
prompt = format_prompt(ctx, "tester", "Write tests.")
|
||||
assert "Previous step output" in prompt
|
||||
assert "race condition" in prompt
|
||||
|
||||
def test_format_loads_prompt_file(self, conn):
|
||||
ctx = build_context(conn, "VDOL-001", "pm", "vdol")
|
||||
prompt = format_prompt(ctx, "pm") # Should load from agents/prompts/pm.md
|
||||
assert "decompose" in prompt.lower() or "pipeline" in prompt.lower()
|
||||
|
||||
def test_format_missing_prompt_file(self, conn):
|
||||
ctx = build_context(conn, "VDOL-001", "analyst", "vdol")
|
||||
prompt = format_prompt(ctx, "analyst") # No analyst.md exists
|
||||
assert "analyst" in prompt.lower()
|
||||
234
tests/test_runner.py
Normal file
234
tests/test_runner.py
Normal file
|
|
@ -0,0 +1,234 @@
|
|||
"""Tests for agents/runner.py — agent execution with mocked claude CLI."""
|
||||
|
||||
import json
|
||||
import pytest
|
||||
from unittest.mock import patch, MagicMock
|
||||
from core.db import init_db
|
||||
from core import models
|
||||
from agents.runner import run_agent, run_pipeline, _try_parse_json
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def conn():
|
||||
c = init_db(":memory:")
|
||||
models.create_project(c, "vdol", "ВДОЛЬ", "~/projects/vdolipoperek",
|
||||
tech_stack=["vue3"])
|
||||
models.create_task(c, "VDOL-001", "vdol", "Fix bug",
|
||||
brief={"route_type": "debug"})
|
||||
yield c
|
||||
c.close()
|
||||
|
||||
|
||||
def _mock_claude_success(output_data):
|
||||
"""Create a mock subprocess result with successful claude output."""
|
||||
mock = MagicMock()
|
||||
mock.stdout = json.dumps(output_data) if isinstance(output_data, dict) else output_data
|
||||
mock.stderr = ""
|
||||
mock.returncode = 0
|
||||
return mock
|
||||
|
||||
|
||||
def _mock_claude_failure(error_msg):
|
||||
mock = MagicMock()
|
||||
mock.stdout = ""
|
||||
mock.stderr = error_msg
|
||||
mock.returncode = 1
|
||||
return mock
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# run_agent
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
class TestRunAgent:
|
||||
@patch("agents.runner.subprocess.run")
|
||||
def test_successful_agent_run(self, mock_run, conn):
|
||||
mock_run.return_value = _mock_claude_success({
|
||||
"result": "Found race condition in useSearch.ts",
|
||||
"usage": {"total_tokens": 5000},
|
||||
"cost_usd": 0.015,
|
||||
})
|
||||
|
||||
result = run_agent(conn, "debugger", "VDOL-001", "vdol")
|
||||
|
||||
assert result["success"] is True
|
||||
assert result["role"] == "debugger"
|
||||
assert result["model"] == "sonnet"
|
||||
assert result["duration_seconds"] >= 0
|
||||
|
||||
# Verify claude was called with right args
|
||||
call_args = mock_run.call_args
|
||||
cmd = call_args[0][0]
|
||||
assert "claude" in cmd[0]
|
||||
assert "-p" in cmd
|
||||
assert "--output-format" in cmd
|
||||
assert "json" in cmd
|
||||
|
||||
@patch("agents.runner.subprocess.run")
|
||||
def test_failed_agent_run(self, mock_run, conn):
|
||||
mock_run.return_value = _mock_claude_failure("API error")
|
||||
|
||||
result = run_agent(conn, "debugger", "VDOL-001", "vdol")
|
||||
|
||||
assert result["success"] is False
|
||||
|
||||
# Should be logged in agent_logs
|
||||
logs = conn.execute("SELECT * FROM agent_logs WHERE task_id='VDOL-001'").fetchall()
|
||||
assert len(logs) == 1
|
||||
assert logs[0]["success"] == 0
|
||||
|
||||
def test_dry_run_returns_prompt(self, conn):
|
||||
result = run_agent(conn, "debugger", "VDOL-001", "vdol", dry_run=True)
|
||||
|
||||
assert result["dry_run"] is True
|
||||
assert result["prompt"] is not None
|
||||
assert "VDOL-001" in result["prompt"]
|
||||
assert result["output"] is None
|
||||
|
||||
@patch("agents.runner.subprocess.run")
|
||||
def test_agent_logs_to_db(self, mock_run, conn):
|
||||
mock_run.return_value = _mock_claude_success({"result": "ok"})
|
||||
|
||||
run_agent(conn, "tester", "VDOL-001", "vdol")
|
||||
|
||||
logs = conn.execute("SELECT * FROM agent_logs WHERE agent_role='tester'").fetchall()
|
||||
assert len(logs) == 1
|
||||
assert logs[0]["project_id"] == "vdol"
|
||||
|
||||
@patch("agents.runner.subprocess.run")
|
||||
def test_previous_output_passed(self, mock_run, conn):
|
||||
mock_run.return_value = _mock_claude_success({"result": "tests pass"})
|
||||
|
||||
run_agent(conn, "tester", "VDOL-001", "vdol",
|
||||
previous_output="Found bug in line 42")
|
||||
|
||||
call_args = mock_run.call_args
|
||||
prompt = call_args[0][0][2] # -p argument
|
||||
assert "line 42" in prompt
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# run_pipeline
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
class TestRunPipeline:
|
||||
@patch("agents.runner.subprocess.run")
|
||||
def test_successful_pipeline(self, mock_run, conn):
|
||||
mock_run.return_value = _mock_claude_success({"result": "done"})
|
||||
|
||||
steps = [
|
||||
{"role": "debugger", "brief": "find bug"},
|
||||
{"role": "tester", "depends_on": "debugger", "brief": "verify"},
|
||||
]
|
||||
result = run_pipeline(conn, "VDOL-001", steps)
|
||||
|
||||
assert result["success"] is True
|
||||
assert result["steps_completed"] == 2
|
||||
assert len(result["results"]) == 2
|
||||
|
||||
# Pipeline created in DB
|
||||
pipe = conn.execute("SELECT * FROM pipelines WHERE task_id='VDOL-001'").fetchone()
|
||||
assert pipe is not None
|
||||
assert pipe["status"] == "completed"
|
||||
|
||||
# Task updated to review
|
||||
task = models.get_task(conn, "VDOL-001")
|
||||
assert task["status"] == "review"
|
||||
|
||||
@patch("agents.runner.subprocess.run")
|
||||
def test_pipeline_fails_on_step(self, mock_run, conn):
|
||||
# First step succeeds, second fails
|
||||
mock_run.side_effect = [
|
||||
_mock_claude_success({"result": "found bug"}),
|
||||
_mock_claude_failure("compilation error"),
|
||||
]
|
||||
|
||||
steps = [
|
||||
{"role": "debugger", "brief": "find"},
|
||||
{"role": "frontend_dev", "brief": "fix"},
|
||||
{"role": "tester", "brief": "test"},
|
||||
]
|
||||
result = run_pipeline(conn, "VDOL-001", steps)
|
||||
|
||||
assert result["success"] is False
|
||||
assert result["steps_completed"] == 1 # Only debugger completed
|
||||
assert "frontend_dev" in result["error"]
|
||||
|
||||
# Pipeline marked as failed
|
||||
pipe = conn.execute("SELECT * FROM pipelines WHERE task_id='VDOL-001'").fetchone()
|
||||
assert pipe["status"] == "failed"
|
||||
|
||||
# Task marked as blocked
|
||||
task = models.get_task(conn, "VDOL-001")
|
||||
assert task["status"] == "blocked"
|
||||
|
||||
def test_pipeline_dry_run(self, conn):
|
||||
steps = [
|
||||
{"role": "debugger", "brief": "find"},
|
||||
{"role": "tester", "brief": "verify"},
|
||||
]
|
||||
result = run_pipeline(conn, "VDOL-001", steps, dry_run=True)
|
||||
|
||||
assert result["dry_run"] is True
|
||||
assert result["success"] is True
|
||||
assert result["steps_completed"] == 2
|
||||
|
||||
# No pipeline created in DB
|
||||
pipes = conn.execute("SELECT * FROM pipelines").fetchall()
|
||||
assert len(pipes) == 0
|
||||
|
||||
@patch("agents.runner.subprocess.run")
|
||||
def test_pipeline_chains_output(self, mock_run, conn):
|
||||
"""Output from step N is passed as previous_output to step N+1."""
|
||||
call_count = [0]
|
||||
|
||||
def side_effect(*args, **kwargs):
|
||||
call_count[0] += 1
|
||||
if call_count[0] == 1:
|
||||
return _mock_claude_success({"result": "bug is in line 42"})
|
||||
return _mock_claude_success({"result": "test written"})
|
||||
|
||||
mock_run.side_effect = side_effect
|
||||
|
||||
steps = [
|
||||
{"role": "debugger", "brief": "find"},
|
||||
{"role": "tester", "brief": "write test"},
|
||||
]
|
||||
run_pipeline(conn, "VDOL-001", steps)
|
||||
|
||||
# Second call should include first step's output in prompt
|
||||
second_call = mock_run.call_args_list[1]
|
||||
prompt = second_call[0][0][2] # -p argument
|
||||
assert "line 42" in prompt or "bug" in prompt
|
||||
|
||||
def test_pipeline_task_not_found(self, conn):
|
||||
result = run_pipeline(conn, "NONEXISTENT", [{"role": "debugger"}])
|
||||
assert result["success"] is False
|
||||
assert "not found" in result["error"]
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# JSON parsing
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
class TestTryParseJson:
|
||||
def test_direct_json(self):
|
||||
assert _try_parse_json('{"a": 1}') == {"a": 1}
|
||||
|
||||
def test_json_in_code_fence(self):
|
||||
text = 'Some text\n```json\n{"a": 1}\n```\nMore text'
|
||||
assert _try_parse_json(text) == {"a": 1}
|
||||
|
||||
def test_json_embedded_in_text(self):
|
||||
text = 'Here is the result: {"status": "ok", "count": 42} and more'
|
||||
result = _try_parse_json(text)
|
||||
assert result == {"status": "ok", "count": 42}
|
||||
|
||||
def test_empty_string(self):
|
||||
assert _try_parse_json("") is None
|
||||
|
||||
def test_no_json(self):
|
||||
assert _try_parse_json("just plain text") is None
|
||||
|
||||
def test_json_array(self):
|
||||
assert _try_parse_json('[1, 2, 3]') == [1, 2, 3]
|
||||
Loading…
Add table
Add a link
Reference in a new issue