Add context builder, agent runner, and pipeline executor

core/context_builder.py: build_context() — assembles role-specific context from DB. PM gets everything; debugger gets gotchas/workarounds; reviewer gets conventions only; tester gets minimal context; security gets security-category decisions. format_prompt() — injects context into role templates. agents/runner.py: run_agent() — launches claude CLI as subprocess with role prompt. run_pipeline() — executes multi-step pipelines sequentially, chains output between steps, logs to agent_logs, creates/updates pipeline records, handles failures gracefully. agents/specialists.yaml — 8 roles with tools, permissions, context rules. agents/prompts/pm.md — PM prompt for task decomposition. agents/prompts/security.md — security audit prompt (OWASP, auth, secrets). CLI: kin run <task_id> [--dry-run] PM decomposes → shows pipeline → executes with confirmation. 31 new tests (15 context_builder, 11 runner, 5 JSON parsing). 92 total, all passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 14:03:32 +02:00 · 2026-03-15 14:03:32 +02:00 · fabae74c19
commit fabae74c19
parent 86e5b8febf
8 changed files with 1207 additions and 0 deletions
--- a/agents/prompts/pm.md
+++ b/agents/prompts/pm.md
@ -0,0 +1,58 @@
+You are a Project Manager for the Kin multi-agent orchestrator.
+
+Your job: decompose a task into a pipeline of specialist steps.
+
+## Input
+
+You receive:
+- PROJECT: id, name, tech stack
+- TASK: id, title, brief
+- DECISIONS: known issues, gotchas, workarounds for this project
+- MODULES: project module map
+- ACTIVE TASKS: currently in-progress tasks (avoid conflicts)
+- AVAILABLE SPECIALISTS: roles you can assign
+- ROUTE TEMPLATES: common pipeline patterns
+
+## Your responsibilities
+
+1. Analyze the task and determine what type of work is needed
+2. Select the right specialists from the available pool
+3. Build an ordered pipeline with dependencies
+4. Include relevant context hints for each specialist
+5. Reference known decisions that are relevant to this task
+
+## Rules
+
+- Keep pipelines SHORT. 2-4 steps for most tasks.
+- Always end with a tester or reviewer step for quality.
+- For debug tasks: debugger first to find the root cause, then fix, then verify.
+- For features: architect first (if complex), then developer, then test + review.
+- Don't assign specialists who aren't needed.
+- If a task is blocked or unclear, say so — don't guess.
+
+## Output format
+
+Return ONLY valid JSON (no markdown, no explanation):
+
+```json
+{
+  "analysis": "Brief analysis of what needs to be done",
+  "pipeline": [
+    {
+      "role": "debugger",
+      "model": "sonnet",
+      "brief": "What this specialist should do",
+      "module": "search",
+      "relevant_decisions": [1, 5, 12]
+    },
+    {
+      "role": "tester",
+      "model": "sonnet",
+      "depends_on": "debugger",
+      "brief": "Write regression test for the fix"
+    }
+  ],
+  "estimated_steps": 2,
+  "route_type": "debug"
+}
+```
--- a/agents/prompts/security.md
+++ b/agents/prompts/security.md
@ -0,0 +1,73 @@
+You are a Security Engineer performing a security audit.
+
+## Scope
+
+Analyze the codebase for security vulnerabilities. Focus on:
+
+1. **Authentication & Authorization**
+   - Missing auth on endpoints
+   - Broken access control
+   - Session management issues
+   - JWT/token handling
+
+2. **OWASP Top 10**
+   - Injection (SQL, NoSQL, command, XSS)
+   - Broken authentication
+   - Sensitive data exposure
+   - Security misconfiguration
+   - SSRF, CSRF
+
+3. **Secrets & Credentials**
+   - Hardcoded secrets, API keys, passwords
+   - Secrets in git history
+   - Unencrypted sensitive data
+   - .env files exposed
+
+4. **Input Validation**
+   - Missing sanitization
+   - File upload vulnerabilities
+   - Path traversal
+   - Unsafe deserialization
+
+5. **Dependencies**
+   - Known CVEs in packages
+   - Outdated dependencies
+   - Supply chain risks
+
+## Rules
+
+- Read code carefully, don't skim
+- Check EVERY endpoint for auth
+- Check EVERY user input for sanitization
+- Severity levels: CRITICAL, HIGH, MEDIUM, LOW, INFO
+- For each finding: describe the vulnerability, show the code, suggest a fix
+- Don't fix code yourself — only report
+
+## Output format
+
+Return ONLY valid JSON:
+
+```json
+{
+  "summary": "Brief overall assessment",
+  "findings": [
+    {
+      "severity": "HIGH",
+      "category": "missing_auth",
+      "title": "Admin endpoint without authentication",
+      "file": "src/routes/admin.js",
+      "line": 42,
+      "description": "The /api/admin/users endpoint has no auth middleware",
+      "recommendation": "Add requireAuth middleware before the handler",
+      "owasp": "A01:2021 Broken Access Control"
+    }
+  ],
+  "stats": {
+    "files_reviewed": 15,
+    "critical": 0,
+    "high": 2,
+    "medium": 3,
+    "low": 1
+  }
+}
+```
--- a/agents/runner.py
+++ b/agents/runner.py
@ -0,0 +1,311 @@
+"""
+Kin agent runner — launches Claude Code as subprocess with role-specific context.
+Each agent = separate process with isolated context.
+"""
+
+import json
+import sqlite3
+import subprocess
+import time
+from pathlib import Path
+from typing import Any
+
+from core import models
+from core.context_builder import build_context, format_prompt
+
+
+def run_agent(
+    conn: sqlite3.Connection,
+    role: str,
+    task_id: str,
+    project_id: str,
+    model: str = "sonnet",
+    previous_output: str | None = None,
+    brief_override: str | None = None,
+    dry_run: bool = False,
+) -> dict:
+    """Run a single Claude Code agent as a subprocess.
+
+    1. Build context from DB
+    2. Format prompt with role template
+    3. Run: claude -p "{prompt}" --output-format json
+    4. Log result to agent_logs
+    5. Return {success, output, tokens_used, duration_seconds, cost_usd}
+    """
+    # Build context
+    ctx = build_context(conn, task_id, role, project_id)
+    if previous_output:
+        ctx["previous_output"] = previous_output
+    if brief_override:
+        if ctx.get("task"):
+            ctx["task"]["brief"] = brief_override
+
+    prompt = format_prompt(ctx, role)
+
+    if dry_run:
+        return {
+            "success": True,
+            "output": None,
+            "prompt": prompt,
+            "role": role,
+            "model": model,
+            "dry_run": True,
+        }
+
+    # Determine working directory
+    project = models.get_project(conn, project_id)
+    working_dir = None
+    if project and role in ("debugger", "frontend_dev", "backend_dev", "tester", "security"):
+        project_path = Path(project["path"]).expanduser()
+        if project_path.is_dir():
+            working_dir = str(project_path)
+
+    # Run claude subprocess
+    start = time.monotonic()
+    result = _run_claude(prompt, model=model, working_dir=working_dir)
+    duration = int(time.monotonic() - start)
+
+    # Parse output
+    output_text = result.get("output", "")
+    success = result["returncode"] == 0
+    parsed_output = _try_parse_json(output_text)
+
+    # Log to DB
+    models.log_agent_run(
+        conn,
+        project_id=project_id,
+        task_id=task_id,
+        agent_role=role,
+        action="execute",
+        input_summary=f"task={task_id}, model={model}",
+        output_summary=output_text[:500] if output_text else None,
+        tokens_used=result.get("tokens_used"),
+        model=model,
+        cost_usd=result.get("cost_usd"),
+        success=success,
+        error_message=result.get("error") if not success else None,
+        duration_seconds=duration,
+    )
+
+    return {
+        "success": success,
+        "output": parsed_output if parsed_output else output_text,
+        "raw_output": output_text,
+        "role": role,
+        "model": model,
+        "duration_seconds": duration,
+        "tokens_used": result.get("tokens_used"),
+        "cost_usd": result.get("cost_usd"),
+    }
+
+
+def _run_claude(
+    prompt: str,
+    model: str = "sonnet",
+    working_dir: str | None = None,
+) -> dict:
+    """Execute claude CLI as subprocess. Returns dict with output, returncode, etc."""
+    cmd = [
+        "claude",
+        "-p", prompt,
+        "--output-format", "json",
+        "--model", model,
+    ]
+
+    try:
+        proc = subprocess.run(
+            cmd,
+            capture_output=True,
+            text=True,
+            timeout=600,  # 10 min max
+            cwd=working_dir,
+        )
+    except FileNotFoundError:
+        return {
+            "output": "",
+            "error": "claude CLI not found in PATH",
+            "returncode": 127,
+        }
+    except subprocess.TimeoutExpired:
+        return {
+            "output": "",
+            "error": "Agent timed out after 600s",
+            "returncode": 124,
+        }
+
+    # Try to extract structured data from JSON output
+    output = proc.stdout or ""
+    result: dict[str, Any] = {
+        "output": output,
+        "error": proc.stderr if proc.returncode != 0 else None,
+        "returncode": proc.returncode,
+    }
+
+    # Parse JSON output from claude --output-format json
+    parsed = _try_parse_json(output)
+    if isinstance(parsed, dict):
+        result["tokens_used"] = parsed.get("usage", {}).get("total_tokens")
+        result["cost_usd"] = parsed.get("cost_usd")
+        # The actual content is usually in result or content
+        if "result" in parsed:
+            result["output"] = parsed["result"]
+        elif "content" in parsed:
+            result["output"] = parsed["content"]
+
+    return result
+
+
+def _try_parse_json(text: str) -> Any:
+    """Try to parse JSON from text. Returns parsed obj or None."""
+    text = text.strip()
+    if not text:
+        return None
+
+    # Direct parse
+    try:
+        return json.loads(text)
+    except json.JSONDecodeError:
+        pass
+
+    # Try to find JSON block in markdown code fences
+    import re
+    m = re.search(r"```(?:json)?\s*\n(.*?)\n```", text, re.DOTALL)
+    if m:
+        try:
+            return json.loads(m.group(1))
+        except json.JSONDecodeError:
+            pass
+
+    # Try to find first { ... } or [ ... ]
+    for start_char, end_char in [("{", "}"), ("[", "]")]:
+        start = text.find(start_char)
+        if start >= 0:
+            # Find matching close
+            depth = 0
+            for i in range(start, len(text)):
+                if text[i] == start_char:
+                    depth += 1
+                elif text[i] == end_char:
+                    depth -= 1
+                    if depth == 0:
+                        try:
+                            return json.loads(text[start:i + 1])
+                        except json.JSONDecodeError:
+                            break
+    return None
+
+
+# ---------------------------------------------------------------------------
+# Pipeline executor
+# ---------------------------------------------------------------------------
+
+def run_pipeline(
+    conn: sqlite3.Connection,
+    task_id: str,
+    steps: list[dict],
+    dry_run: bool = False,
+) -> dict:
+    """Execute a multi-step pipeline of agents.
+
+    steps = [
+        {"role": "debugger", "model": "opus", "brief": "..."},
+        {"role": "tester", "depends_on": "debugger", "brief": "..."},
+    ]
+
+    Returns {success, steps_completed, total_cost, total_tokens, total_duration, results}
+    """
+    task = models.get_task(conn, task_id)
+    if not task:
+        return {"success": False, "error": f"Task '{task_id}' not found"}
+
+    project_id = task["project_id"]
+
+    # Determine route type from steps or task brief
+    route_type = "custom"
+    if task.get("brief") and isinstance(task["brief"], dict):
+        route_type = task["brief"].get("route_type", "custom") or "custom"
+
+    # Create pipeline in DB
+    pipeline = None
+    if not dry_run:
+        pipeline = models.create_pipeline(
+            conn, task_id, project_id, route_type, steps,
+        )
+        models.update_task(conn, task_id, status="in_progress")
+
+    results = []
+    total_cost = 0.0
+    total_tokens = 0
+    total_duration = 0
+    previous_output = None
+
+    for i, step in enumerate(steps):
+        role = step["role"]
+        model = step.get("model", "sonnet")
+        brief = step.get("brief")
+
+        result = run_agent(
+            conn, role, task_id, project_id,
+            model=model,
+            previous_output=previous_output,
+            brief_override=brief,
+            dry_run=dry_run,
+        )
+        results.append(result)
+
+        if dry_run:
+            continue
+
+        # Accumulate stats
+        total_cost += result.get("cost_usd") or 0
+        total_tokens += result.get("tokens_used") or 0
+        total_duration += result.get("duration_seconds") or 0
+
+        if not result["success"]:
+            # Pipeline failed — stop and mark as failed
+            if pipeline:
+                models.update_pipeline(
+                    conn, pipeline["id"],
+                    status="failed",
+                    total_cost_usd=total_cost,
+                    total_tokens=total_tokens,
+                    total_duration_seconds=total_duration,
+                )
+            models.update_task(conn, task_id, status="blocked")
+            return {
+                "success": False,
+                "error": f"Step {i+1}/{len(steps)} ({role}) failed",
+                "steps_completed": i,
+                "results": results,
+                "total_cost_usd": total_cost,
+                "total_tokens": total_tokens,
+                "total_duration_seconds": total_duration,
+                "pipeline_id": pipeline["id"] if pipeline else None,
+            }
+
+        # Chain output to next step
+        previous_output = result.get("raw_output") or result.get("output")
+        if isinstance(previous_output, (dict, list)):
+            previous_output = json.dumps(previous_output, ensure_ascii=False)
+
+    # Pipeline completed
+    if pipeline and not dry_run:
+        models.update_pipeline(
+            conn, pipeline["id"],
+            status="completed",
+            total_cost_usd=total_cost,
+            total_tokens=total_tokens,
+            total_duration_seconds=total_duration,
+        )
+        models.update_task(conn, task_id, status="review")
+
+    return {
+        "success": True,
+        "steps_completed": len(steps),
+        "results": results,
+        "total_cost_usd": total_cost,
+        "total_tokens": total_tokens,
+        "total_duration_seconds": total_duration,
+        "pipeline_id": pipeline["id"] if pipeline else None,
+        "dry_run": dry_run,
+    }
--- a/agents/specialists.yaml
+++ b/agents/specialists.yaml
@ -0,0 +1,104 @@
+# Kin specialist pool — roles available for pipeline construction.
+# PM selects from this pool based on task type.
+
+specialists:
+  pm:
+    name: "Project Manager"
+    model: sonnet
+    tools: [Read, Grep, Glob]
+    description: "Decomposes tasks, selects specialists, builds pipelines"
+    permissions: read_only
+    context_rules:
+      decisions: all
+      modules: all
+
+  architect:
+    name: "Software Architect"
+    model: sonnet
+    tools: [Read, Grep, Glob]
+    description: "Designs solutions, reviews structure, writes specs"
+    permissions: read_only
+    context_rules:
+      decisions: all
+      modules: all
+
+  debugger:
+    name: "Debugger"
+    model: sonnet
+    tools: [Read, Grep, Glob, Bash]
+    description: "Finds root causes, reads logs, traces execution"
+    permissions: read_bash
+    working_dir: project
+    context_rules:
+      decisions: [gotcha, workaround]
+
+  frontend_dev:
+    name: "Frontend Developer"
+    model: sonnet
+    tools: [Read, Write, Edit, Bash, Glob, Grep]
+    description: "Implements UI: Vue, CSS, components, composables"
+    permissions: full
+    working_dir: project
+    context_rules:
+      decisions: [gotcha, workaround, convention]
+
+  backend_dev:
+    name: "Backend Developer"
+    model: sonnet
+    tools: [Read, Write, Edit, Bash, Glob, Grep]
+    description: "Implements API, services, database, business logic"
+    permissions: full
+    working_dir: project
+    context_rules:
+      decisions: [gotcha, workaround, convention]
+
+  tester:
+    name: "Tester"
+    model: sonnet
+    tools: [Read, Write, Bash, Glob, Grep]
+    description: "Writes and runs tests, verifies fixes"
+    permissions: full
+    working_dir: project
+    context_rules:
+      decisions: []
+
+  reviewer:
+    name: "Code Reviewer"
+    model: sonnet
+    tools: [Read, Grep, Glob]
+    description: "Reviews code for quality, conventions, bugs"
+    permissions: read_only
+    context_rules:
+      decisions: [convention]
+
+  security:
+    name: "Security Engineer"
+    model: sonnet
+    tools: [Read, Grep, Glob, Bash]
+    description: "OWASP audit, auth checks, secrets scan, vulnerability analysis"
+    permissions: read_bash
+    working_dir: project
+    context_rules:
+      decisions_category: security
+
+# Route templates — PM uses these to build pipelines
+routes:
+  debug:
+    steps: [debugger, tester, frontend_dev, tester]
+    description: "Find bug → verify → fix → verify fix"
+
+  feature:
+    steps: [architect, frontend_dev, tester, reviewer]
+    description: "Design → implement → test → review"
+
+  refactor:
+    steps: [architect, frontend_dev, tester, reviewer]
+    description: "Plan refactor → implement → test → review"
+
+  hotfix:
+    steps: [debugger, frontend_dev, tester]
+    description: "Find → fix → verify (fast track)"
+
+  security_audit:
+    steps: [security, architect]
+    description: "Audit → remediation plan"
--- a/cli/main.py
+++ b/cli/main.py
@ -408,6 +408,88 @@ def cost(ctx, period):
    click.echo(f"\nTotal: ${total:.4f}")


+# ===========================================================================
+# run
+# ===========================================================================
+
+@cli.command("run")
+@click.argument("task_id")
+@click.option("--dry-run", is_flag=True, help="Show pipeline plan without executing")
+@click.pass_context
+def run_task(ctx, task_id, dry_run):
+    """Run a task through the agent pipeline.
+
+    PM decomposes the task into specialist steps, then the pipeline executes.
+    With --dry-run, shows the plan without running agents.
+    """
+    from agents.runner import run_agent, run_pipeline
+
+    conn = ctx.obj["conn"]
+    task = models.get_task(conn, task_id)
+    if not task:
+        click.echo(f"Task '{task_id}' not found.", err=True)
+        raise SystemExit(1)
+
+    project_id = task["project_id"]
+    click.echo(f"Task: {task['id']} — {task['title']}")
+
+    # Step 1: PM decomposes
+    click.echo("Running PM to decompose task...")
+    pm_result = run_agent(
+        conn, "pm", task_id, project_id,
+        model="sonnet", dry_run=dry_run,
+    )
+
+    if dry_run:
+        click.echo("\n--- PM Prompt (dry-run) ---")
+        click.echo(pm_result.get("prompt", "")[:2000])
+        click.echo("\n(Dry-run: PM would produce a pipeline JSON)")
+        return
+
+    if not pm_result["success"]:
+        click.echo(f"PM failed: {pm_result.get('output', 'unknown error')}", err=True)
+        raise SystemExit(1)
+
+    # Parse PM output for pipeline
+    output = pm_result.get("output")
+    if isinstance(output, str):
+        try:
+            output = json.loads(output)
+        except json.JSONDecodeError:
+            click.echo(f"PM returned non-JSON output:\n{output[:500]}", err=True)
+            raise SystemExit(1)
+
+    if not isinstance(output, dict) or "pipeline" not in output:
+        click.echo(f"PM output missing 'pipeline' key:\n{json.dumps(output, indent=2)[:500]}", err=True)
+        raise SystemExit(1)
+
+    pipeline_steps = output["pipeline"]
+    analysis = output.get("analysis", "")
+
+    click.echo(f"\nAnalysis: {analysis}")
+    click.echo(f"Pipeline ({len(pipeline_steps)} steps):")
+    for i, step in enumerate(pipeline_steps, 1):
+        click.echo(f"  {i}. {step['role']} ({step.get('model', 'sonnet')}): {step.get('brief', '')}")
+
+    if not click.confirm("\nExecute pipeline?"):
+        click.echo("Aborted.")
+        return
+
+    # Step 2: Execute pipeline
+    click.echo("\nExecuting pipeline...")
+    result = run_pipeline(conn, task_id, pipeline_steps)
+
+    if result["success"]:
+        click.echo(f"\nPipeline completed: {result['steps_completed']} steps")
+    else:
+        click.echo(f"\nPipeline failed at step: {result.get('error', 'unknown')}", err=True)
+
+    if result.get("total_cost_usd"):
+        click.echo(f"Cost: ${result['total_cost_usd']:.4f}")
+    if result.get("total_duration_seconds"):
+        click.echo(f"Duration: {result['total_duration_seconds']}s")
+
+
 # ===========================================================================
 # bootstrap
 # ===========================================================================
--- a/core/context_builder.py
+++ b/core/context_builder.py
@ -0,0 +1,212 @@
+"""
+Kin context builder — assembles role-specific context from DB for agent prompts.
+Each role gets only the information it needs, keeping prompts focused.
+"""
+
+import json
+import sqlite3
+from pathlib import Path
+
+from core import models
+
+PROMPTS_DIR = Path(__file__).parent.parent / "agents" / "prompts"
+SPECIALISTS_PATH = Path(__file__).parent.parent / "agents" / "specialists.yaml"
+
+
+def _load_specialists() -> dict:
+    """Load specialists.yaml (lazy, no pyyaml dependency — simple parser)."""
+    path = SPECIALISTS_PATH
+    if not path.exists():
+        return {}
+    import yaml
+    return yaml.safe_load(path.read_text())
+
+
+def build_context(
+    conn: sqlite3.Connection,
+    task_id: str,
+    role: str,
+    project_id: str,
+) -> dict:
+    """Build role-specific context from DB.
+
+    Returns a dict with keys: task, project, and role-specific data.
+    """
+    task = models.get_task(conn, task_id)
+    project = models.get_project(conn, project_id)
+
+    ctx = {
+        "task": _slim_task(task) if task else None,
+        "project": _slim_project(project) if project else None,
+        "role": role,
+    }
+
+    if role == "pm":
+        ctx["modules"] = models.get_modules(conn, project_id)
+        ctx["decisions"] = models.get_decisions(conn, project_id)
+        ctx["active_tasks"] = models.list_tasks(conn, project_id=project_id, status="in_progress")
+        try:
+            specs = _load_specialists()
+            ctx["available_specialists"] = list(specs.get("specialists", {}).keys())
+            ctx["routes"] = specs.get("routes", {})
+        except Exception:
+            ctx["available_specialists"] = []
+            ctx["routes"] = {}
+
+    elif role == "architect":
+        ctx["modules"] = models.get_modules(conn, project_id)
+        ctx["decisions"] = models.get_decisions(conn, project_id)
+
+    elif role == "debugger":
+        ctx["decisions"] = models.get_decisions(
+            conn, project_id, types=["gotcha", "workaround"],
+        )
+        ctx["module_hint"] = _extract_module_hint(task)
+
+    elif role in ("frontend_dev", "backend_dev"):
+        ctx["decisions"] = models.get_decisions(
+            conn, project_id, types=["gotcha", "workaround", "convention"],
+        )
+
+    elif role == "reviewer":
+        ctx["decisions"] = models.get_decisions(
+            conn, project_id, types=["convention"],
+        )
+
+    elif role == "tester":
+        # Minimal context — just the task spec
+        pass
+
+    elif role == "security":
+        ctx["decisions"] = models.get_decisions(
+            conn, project_id, category="security",
+        )
+
+    else:
+        # Unknown role — give decisions as fallback
+        ctx["decisions"] = models.get_decisions(conn, project_id, limit=20)
+
+    return ctx
+
+
+def _slim_task(task: dict) -> dict:
+    """Extract only relevant fields from a task for the prompt."""
+    return {
+        "id": task["id"],
+        "title": task["title"],
+        "status": task["status"],
+        "priority": task["priority"],
+        "assigned_role": task.get("assigned_role"),
+        "brief": task.get("brief"),
+        "spec": task.get("spec"),
+    }
+
+
+def _slim_project(project: dict) -> dict:
+    """Extract only relevant fields from a project."""
+    return {
+        "id": project["id"],
+        "name": project["name"],
+        "path": project["path"],
+        "tech_stack": project.get("tech_stack"),
+    }
+
+
+def _extract_module_hint(task: dict | None) -> str | None:
+    """Try to extract module name from task brief."""
+    if not task:
+        return None
+    brief = task.get("brief")
+    if isinstance(brief, dict):
+        return brief.get("module")
+    return None
+
+
+def format_prompt(context: dict, role: str, prompt_template: str | None = None) -> str:
+    """Format a prompt by injecting context into a role template.
+
+    If prompt_template is None, loads from agents/prompts/{role}.md.
+    """
+    if prompt_template is None:
+        prompt_path = PROMPTS_DIR / f"{role}.md"
+        if prompt_path.exists():
+            prompt_template = prompt_path.read_text()
+        else:
+            prompt_template = f"You are a {role}. Complete the task described below."
+
+    sections = [prompt_template, ""]
+
+    # Project info
+    proj = context.get("project")
+    if proj:
+        sections.append(f"## Project: {proj['id']} — {proj['name']}")
+        if proj.get("tech_stack"):
+            sections.append(f"Tech stack: {', '.join(proj['tech_stack'])}")
+        sections.append(f"Path: {proj['path']}")
+        sections.append("")
+
+    # Task info
+    task = context.get("task")
+    if task:
+        sections.append(f"## Task: {task['id']} — {task['title']}")
+        sections.append(f"Status: {task['status']}, Priority: {task['priority']}")
+        if task.get("brief"):
+            sections.append(f"Brief: {json.dumps(task['brief'], ensure_ascii=False)}")
+        if task.get("spec"):
+            sections.append(f"Spec: {json.dumps(task['spec'], ensure_ascii=False)}")
+        sections.append("")
+
+    # Decisions
+    decisions = context.get("decisions")
+    if decisions:
+        sections.append(f"## Known decisions ({len(decisions)}):")
+        for d in decisions[:30]:  # Cap at 30 to avoid token bloat
+            tags = f" [{', '.join(d['tags'])}]" if d.get("tags") else ""
+            sections.append(f"- #{d['id']} [{d['type']}] {d['title']}{tags}")
+        sections.append("")
+
+    # Modules
+    modules = context.get("modules")
+    if modules:
+        sections.append(f"## Modules ({len(modules)}):")
+        for m in modules:
+            sections.append(f"- {m['name']} ({m['type']}) — {m['path']}")
+        sections.append("")
+
+    # Active tasks (PM)
+    active = context.get("active_tasks")
+    if active:
+        sections.append(f"## Active tasks ({len(active)}):")
+        for t in active:
+            sections.append(f"- {t['id']}: {t['title']} [{t['status']}]")
+        sections.append("")
+
+    # Available specialists (PM)
+    specialists = context.get("available_specialists")
+    if specialists:
+        sections.append(f"## Available specialists: {', '.join(specialists)}")
+        sections.append("")
+
+    # Routes (PM)
+    routes = context.get("routes")
+    if routes:
+        sections.append("## Route templates:")
+        for name, route in routes.items():
+            steps = " → ".join(route.get("steps", []))
+            sections.append(f"- {name}: {steps}")
+        sections.append("")
+
+    # Module hint (debugger)
+    hint = context.get("module_hint")
+    if hint:
+        sections.append(f"## Target module: {hint}")
+        sections.append("")
+
+    # Previous step output (pipeline chaining)
+    prev = context.get("previous_output")
+    if prev:
+        sections.append("## Previous step output:")
+        sections.append(prev if isinstance(prev, str) else json.dumps(prev, ensure_ascii=False))
+        sections.append("")
+
+    return "\n".join(sections)
--- a/tests/test_context_builder.py
+++ b/tests/test_context_builder.py
@ -0,0 +1,133 @@
+"""Tests for core/context_builder.py — context assembly per role."""
+
+import pytest
+from core.db import init_db
+from core import models
+from core.context_builder import build_context, format_prompt
+
+
+@pytest.fixture
+def conn():
+    c = init_db(":memory:")
+    # Seed project, modules, decisions, tasks
+    models.create_project(c, "vdol", "ВДОЛЬ и ПОПЕРЕК", "~/projects/vdolipoperek",
+                          tech_stack=["vue3", "typescript", "nodejs"])
+    models.add_module(c, "vdol", "search", "frontend", "src/search/")
+    models.add_module(c, "vdol", "api", "backend", "src/api/")
+    models.add_decision(c, "vdol", "gotcha", "Safari bug",
+                        "position:fixed breaks", category="ui", tags=["ios"])
+    models.add_decision(c, "vdol", "workaround", "API rate limit",
+                        "10 req/s max", category="api")
+    models.add_decision(c, "vdol", "convention", "Use WAL mode",
+                        "Always use WAL for SQLite", category="architecture")
+    models.add_decision(c, "vdol", "decision", "Auth required",
+                        "All endpoints need auth", category="security")
+    models.create_task(c, "VDOL-001", "vdol", "Fix search filters",
+                       brief={"module": "search", "route_type": "debug"})
+    models.create_task(c, "VDOL-002", "vdol", "Add payments",
+                       status="in_progress")
+    yield c
+    c.close()
+
+
+class TestBuildContext:
+    def test_pm_gets_everything(self, conn):
+        ctx = build_context(conn, "VDOL-001", "pm", "vdol")
+        assert ctx["task"]["id"] == "VDOL-001"
+        assert ctx["project"]["id"] == "vdol"
+        assert len(ctx["modules"]) == 2
+        assert len(ctx["decisions"]) == 4  # all decisions
+        assert len(ctx["active_tasks"]) == 1  # VDOL-002 in_progress
+        assert "pm" in ctx["available_specialists"]
+
+    def test_architect_gets_all_decisions_and_modules(self, conn):
+        ctx = build_context(conn, "VDOL-001", "architect", "vdol")
+        assert len(ctx["modules"]) == 2
+        assert len(ctx["decisions"]) == 4
+
+    def test_debugger_gets_only_gotcha_workaround(self, conn):
+        ctx = build_context(conn, "VDOL-001", "debugger", "vdol")
+        types = {d["type"] for d in ctx["decisions"]}
+        assert types <= {"gotcha", "workaround"}
+        assert "convention" not in types
+        assert "decision" not in types
+        assert ctx["module_hint"] == "search"
+
+    def test_frontend_dev_gets_gotcha_workaround_convention(self, conn):
+        ctx = build_context(conn, "VDOL-001", "frontend_dev", "vdol")
+        types = {d["type"] for d in ctx["decisions"]}
+        assert "gotcha" in types
+        assert "workaround" in types
+        assert "convention" in types
+        assert "decision" not in types  # plain decisions excluded
+
+    def test_backend_dev_same_as_frontend(self, conn):
+        ctx = build_context(conn, "VDOL-001", "backend_dev", "vdol")
+        types = {d["type"] for d in ctx["decisions"]}
+        assert types == {"gotcha", "workaround", "convention"}
+
+    def test_reviewer_gets_only_conventions(self, conn):
+        ctx = build_context(conn, "VDOL-001", "reviewer", "vdol")
+        types = {d["type"] for d in ctx["decisions"]}
+        assert types == {"convention"}
+
+    def test_tester_gets_minimal_context(self, conn):
+        ctx = build_context(conn, "VDOL-001", "tester", "vdol")
+        assert ctx["task"] is not None
+        assert ctx["project"] is not None
+        assert "decisions" not in ctx
+        assert "modules" not in ctx
+
+    def test_security_gets_security_decisions(self, conn):
+        ctx = build_context(conn, "VDOL-001", "security", "vdol")
+        categories = {d.get("category") for d in ctx["decisions"]}
+        assert categories == {"security"}
+
+    def test_unknown_role_gets_fallback(self, conn):
+        ctx = build_context(conn, "VDOL-001", "unknown_role", "vdol")
+        assert "decisions" in ctx
+        assert len(ctx["decisions"]) > 0
+
+
+class TestFormatPrompt:
+    def test_format_with_template(self, conn):
+        ctx = build_context(conn, "VDOL-001", "debugger", "vdol")
+        prompt = format_prompt(ctx, "debugger", "You are a debugger. Find bugs.")
+        assert "You are a debugger" in prompt
+        assert "VDOL-001" in prompt
+        assert "Fix search filters" in prompt
+        assert "vdol" in prompt
+        assert "vue3" in prompt
+
+    def test_format_includes_decisions(self, conn):
+        ctx = build_context(conn, "VDOL-001", "debugger", "vdol")
+        prompt = format_prompt(ctx, "debugger", "Debug this.")
+        assert "Safari bug" in prompt
+        assert "API rate limit" in prompt
+        # Convention should NOT be here (debugger doesn't get it)
+        assert "WAL mode" not in prompt
+
+    def test_format_pm_includes_specialists(self, conn):
+        ctx = build_context(conn, "VDOL-001", "pm", "vdol")
+        prompt = format_prompt(ctx, "pm", "You are PM.")
+        assert "Available specialists" in prompt
+        assert "debugger" in prompt
+        assert "Active tasks" in prompt
+        assert "VDOL-002" in prompt
+
+    def test_format_with_previous_output(self, conn):
+        ctx = build_context(conn, "VDOL-001", "tester", "vdol")
+        ctx["previous_output"] = "Found race condition in useSearch.ts"
+        prompt = format_prompt(ctx, "tester", "Write tests.")
+        assert "Previous step output" in prompt
+        assert "race condition" in prompt
+
+    def test_format_loads_prompt_file(self, conn):
+        ctx = build_context(conn, "VDOL-001", "pm", "vdol")
+        prompt = format_prompt(ctx, "pm")  # Should load from agents/prompts/pm.md
+        assert "decompose" in prompt.lower() or "pipeline" in prompt.lower()
+
+    def test_format_missing_prompt_file(self, conn):
+        ctx = build_context(conn, "VDOL-001", "analyst", "vdol")
+        prompt = format_prompt(ctx, "analyst")  # No analyst.md exists
+        assert "analyst" in prompt.lower()
--- a/tests/test_runner.py
+++ b/tests/test_runner.py
@ -0,0 +1,234 @@
+"""Tests for agents/runner.py — agent execution with mocked claude CLI."""
+
+import json
+import pytest
+from unittest.mock import patch, MagicMock
+from core.db import init_db
+from core import models
+from agents.runner import run_agent, run_pipeline, _try_parse_json
+
+
+@pytest.fixture
+def conn():
+    c = init_db(":memory:")
+    models.create_project(c, "vdol", "ВДОЛЬ", "~/projects/vdolipoperek",
+                          tech_stack=["vue3"])
+    models.create_task(c, "VDOL-001", "vdol", "Fix bug",
+                       brief={"route_type": "debug"})
+    yield c
+    c.close()
+
+
+def _mock_claude_success(output_data):
+    """Create a mock subprocess result with successful claude output."""
+    mock = MagicMock()
+    mock.stdout = json.dumps(output_data) if isinstance(output_data, dict) else output_data
+    mock.stderr = ""
+    mock.returncode = 0
+    return mock
+
+
+def _mock_claude_failure(error_msg):
+    mock = MagicMock()
+    mock.stdout = ""
+    mock.stderr = error_msg
+    mock.returncode = 1
+    return mock
+
+
+# ---------------------------------------------------------------------------
+# run_agent
+# ---------------------------------------------------------------------------
+
+class TestRunAgent:
+    @patch("agents.runner.subprocess.run")
+    def test_successful_agent_run(self, mock_run, conn):
+        mock_run.return_value = _mock_claude_success({
+            "result": "Found race condition in useSearch.ts",
+            "usage": {"total_tokens": 5000},
+            "cost_usd": 0.015,
+        })
+
+        result = run_agent(conn, "debugger", "VDOL-001", "vdol")
+
+        assert result["success"] is True
+        assert result["role"] == "debugger"
+        assert result["model"] == "sonnet"
+        assert result["duration_seconds"] >= 0
+
+        # Verify claude was called with right args
+        call_args = mock_run.call_args
+        cmd = call_args[0][0]
+        assert "claude" in cmd[0]
+        assert "-p" in cmd
+        assert "--output-format" in cmd
+        assert "json" in cmd
+
+    @patch("agents.runner.subprocess.run")
+    def test_failed_agent_run(self, mock_run, conn):
+        mock_run.return_value = _mock_claude_failure("API error")
+
+        result = run_agent(conn, "debugger", "VDOL-001", "vdol")
+
+        assert result["success"] is False
+
+        # Should be logged in agent_logs
+        logs = conn.execute("SELECT * FROM agent_logs WHERE task_id='VDOL-001'").fetchall()
+        assert len(logs) == 1
+        assert logs[0]["success"] == 0
+
+    def test_dry_run_returns_prompt(self, conn):
+        result = run_agent(conn, "debugger", "VDOL-001", "vdol", dry_run=True)
+
+        assert result["dry_run"] is True
+        assert result["prompt"] is not None
+        assert "VDOL-001" in result["prompt"]
+        assert result["output"] is None
+
+    @patch("agents.runner.subprocess.run")
+    def test_agent_logs_to_db(self, mock_run, conn):
+        mock_run.return_value = _mock_claude_success({"result": "ok"})
+
+        run_agent(conn, "tester", "VDOL-001", "vdol")
+
+        logs = conn.execute("SELECT * FROM agent_logs WHERE agent_role='tester'").fetchall()
+        assert len(logs) == 1
+        assert logs[0]["project_id"] == "vdol"
+
+    @patch("agents.runner.subprocess.run")
+    def test_previous_output_passed(self, mock_run, conn):
+        mock_run.return_value = _mock_claude_success({"result": "tests pass"})
+
+        run_agent(conn, "tester", "VDOL-001", "vdol",
+                  previous_output="Found bug in line 42")
+
+        call_args = mock_run.call_args
+        prompt = call_args[0][0][2]  # -p argument
+        assert "line 42" in prompt
+
+
+# ---------------------------------------------------------------------------
+# run_pipeline
+# ---------------------------------------------------------------------------
+
+class TestRunPipeline:
+    @patch("agents.runner.subprocess.run")
+    def test_successful_pipeline(self, mock_run, conn):
+        mock_run.return_value = _mock_claude_success({"result": "done"})
+
+        steps = [
+            {"role": "debugger", "brief": "find bug"},
+            {"role": "tester", "depends_on": "debugger", "brief": "verify"},
+        ]
+        result = run_pipeline(conn, "VDOL-001", steps)
+
+        assert result["success"] is True
+        assert result["steps_completed"] == 2
+        assert len(result["results"]) == 2
+
+        # Pipeline created in DB
+        pipe = conn.execute("SELECT * FROM pipelines WHERE task_id='VDOL-001'").fetchone()
+        assert pipe is not None
+        assert pipe["status"] == "completed"
+
+        # Task updated to review
+        task = models.get_task(conn, "VDOL-001")
+        assert task["status"] == "review"
+
+    @patch("agents.runner.subprocess.run")
+    def test_pipeline_fails_on_step(self, mock_run, conn):
+        # First step succeeds, second fails
+        mock_run.side_effect = [
+            _mock_claude_success({"result": "found bug"}),
+            _mock_claude_failure("compilation error"),
+        ]
+
+        steps = [
+            {"role": "debugger", "brief": "find"},
+            {"role": "frontend_dev", "brief": "fix"},
+            {"role": "tester", "brief": "test"},
+        ]
+        result = run_pipeline(conn, "VDOL-001", steps)
+
+        assert result["success"] is False
+        assert result["steps_completed"] == 1  # Only debugger completed
+        assert "frontend_dev" in result["error"]
+
+        # Pipeline marked as failed
+        pipe = conn.execute("SELECT * FROM pipelines WHERE task_id='VDOL-001'").fetchone()
+        assert pipe["status"] == "failed"
+
+        # Task marked as blocked
+        task = models.get_task(conn, "VDOL-001")
+        assert task["status"] == "blocked"
+
+    def test_pipeline_dry_run(self, conn):
+        steps = [
+            {"role": "debugger", "brief": "find"},
+            {"role": "tester", "brief": "verify"},
+        ]
+        result = run_pipeline(conn, "VDOL-001", steps, dry_run=True)
+
+        assert result["dry_run"] is True
+        assert result["success"] is True
+        assert result["steps_completed"] == 2
+
+        # No pipeline created in DB
+        pipes = conn.execute("SELECT * FROM pipelines").fetchall()
+        assert len(pipes) == 0
+
+    @patch("agents.runner.subprocess.run")
+    def test_pipeline_chains_output(self, mock_run, conn):
+        """Output from step N is passed as previous_output to step N+1."""
+        call_count = [0]
+
+        def side_effect(*args, **kwargs):
+            call_count[0] += 1
+            if call_count[0] == 1:
+                return _mock_claude_success({"result": "bug is in line 42"})
+            return _mock_claude_success({"result": "test written"})
+
+        mock_run.side_effect = side_effect
+
+        steps = [
+            {"role": "debugger", "brief": "find"},
+            {"role": "tester", "brief": "write test"},
+        ]
+        run_pipeline(conn, "VDOL-001", steps)
+
+        # Second call should include first step's output in prompt
+        second_call = mock_run.call_args_list[1]
+        prompt = second_call[0][0][2]  # -p argument
+        assert "line 42" in prompt or "bug" in prompt
+
+    def test_pipeline_task_not_found(self, conn):
+        result = run_pipeline(conn, "NONEXISTENT", [{"role": "debugger"}])
+        assert result["success"] is False
+        assert "not found" in result["error"]
+
+
+# ---------------------------------------------------------------------------
+# JSON parsing
+# ---------------------------------------------------------------------------
+
+class TestTryParseJson:
+    def test_direct_json(self):
+        assert _try_parse_json('{"a": 1}') == {"a": 1}
+
+    def test_json_in_code_fence(self):
+        text = 'Some text\n```json\n{"a": 1}\n```\nMore text'
+        assert _try_parse_json(text) == {"a": 1}
+
+    def test_json_embedded_in_text(self):
+        text = 'Here is the result: {"status": "ok", "count": 42} and more'
+        result = _try_parse_json(text)
+        assert result == {"status": "ok", "count": 42}
+
+    def test_empty_string(self):
+        assert _try_parse_json("") is None
+
+    def test_no_json(self):
+        assert _try_parse_json("just plain text") is None
+
+    def test_json_array(self):
+        assert _try_parse_json('[1, 2, 3]') == [1, 2, 3]