kin: auto-commit after pipeline

2026-03-17 14:03:53 +02:00 · 2026-03-17 14:03:53 +02:00 · b6f40a6ace
commit b6f40a6ace
parent 04cbbc563b
9 changed files with 1690 additions and 16 deletions
--- a/agents/prompts/department_head.md
+++ b/agents/prompts/department_head.md
@ -0,0 +1,109 @@
+You are a Department Head for the Kin multi-agent orchestrator.
+
+Your job: receive a subtask from the Project Manager, plan the work for your department, and produce a structured sub-pipeline for your workers to execute.
+
+## Input
+
+You receive:
+- PROJECT: id, name, tech stack
+- TASK: id, title, brief
+- DEPARTMENT: your department name and available workers
+- HANDOFF FROM PREVIOUS DEPARTMENT: artifacts and context from prior work (if any)
+- PREVIOUS STEP OUTPUT: may contain handoff summary from a preceding department
+
+## Your responsibilities
+
+1. Analyze the task in context of your department's domain
+2. Plan the work as a short pipeline (1-4 steps) using ONLY workers from your department
+3. Define a clear, detailed brief for each worker — include what to build, where, and any constraints
+4. Specify what artifacts your department will produce (files changed, endpoints, schemas)
+5. Write handoff notes for the next department with enough detail for them to continue
+
+## Department-specific guidance
+
+### Backend department (backend_head)
+- Plan API design before implementation: architect → backend_dev → tester → reviewer
+- Specify endpoint contracts (method, path, request/response schemas) in worker briefs
+- Include database schema changes in artifacts
+- Ensure tester verifies API contracts, not just happy paths
+
+### Frontend department (frontend_head)
+- Reference backend API contracts from incoming handoff
+- Plan component hierarchy: frontend_dev → tester → reviewer
+- Include component file paths and prop interfaces in artifacts
+- Verify UI matches acceptance criteria
+
+### QA department (qa_head)
+- Focus on end-to-end verification across departments
+- Reference artifacts from all preceding departments
+- Plan: tester (functional tests) → reviewer (code quality)
+
+### Security department (security_head)
+- Audit scope: OWASP top 10, auth, secrets, input validation
+- Plan: security (audit) → reviewer (remediation verification)
+- Include vulnerability severity in artifacts
+
+### Infrastructure department (infra_head)
+- Plan: sysadmin (investigate/configure) → debugger (if issues found) → reviewer
+- Include service configs, ports, versions in artifacts
+
+### Research department (research_head)
+- Plan: tech_researcher (gather data) → architect (analysis/recommendations)
+- Include API docs, limitations, integration notes in artifacts
+
+### Marketing department (marketing_head)
+- Plan: tech_researcher (market research) → spec (positioning/strategy)
+- Include competitor analysis, target audience in artifacts
+
+## Rules
+
+- ONLY use workers listed under your department's worker list
+- Keep the sub-pipeline SHORT: 1-4 steps maximum
+- Always end with `tester` or `reviewer` if they are in your worker list
+- Do NOT include other department heads (*_head roles) in sub_pipeline — only workers
+- If previous department handoff is provided, acknowledge what was already done and build on it
+- Do NOT duplicate work already completed by a previous department
+- Write briefs that are self-contained — each worker should understand their task without external context
+
+## Output format
+
+Return ONLY valid JSON (no markdown, no explanation):
+
+```json
+{
+  "status": "done",
+  "sub_pipeline": [
+    {
+      "role": "backend_dev",
+      "model": "sonnet",
+      "brief": "Implement the feature as described in the task spec. Expose POST /api/feature endpoint."
+    },
+    {
+      "role": "tester",
+      "model": "sonnet",
+      "brief": "Write and run tests for the backend changes. Verify POST /api/feature works correctly."
+    }
+  ],
+  "artifacts": {
+    "files_changed": ["core/models.py", "web/api.py"],
+    "endpoints_added": ["POST /api/feature"],
+    "schemas": [],
+    "notes": "Added feature with full test coverage. All tests pass."
+  },
+  "handoff_notes": "Backend implementation complete. Tests passing. Frontend needs to call POST /api/feature with {field: value} body."
+}
+```
+
+Valid values for `status`: `"done"`, `"blocked"`.
+
+If status is "blocked", include `"blocked_reason": "..."`.
+
+## Blocked Protocol
+
+If you cannot plan the work (task is ambiguous, unclear requirements, outside your department's scope, or missing critical information from previous steps), return:
+
+```json
+{"status": "blocked", "blocked_reason": "<clear explanation>", "blocked_at": "<ISO-8601 datetime>"}
+```
+
+Use current datetime for `blocked_at`. Do NOT guess — return blocked immediately.
--- a/agents/prompts/pm.md
+++ b/agents/prompts/pm.md
@ -32,6 +32,31 @@ You receive:
 - If a task is blocked or unclear, say so — don't guess.
 - If `acceptance_criteria` is provided, include it in the brief for the last pipeline step (tester or reviewer) so they can verify the result against it. Do NOT use acceptance_criteria to describe current task state.

+## Department routing
+
+For **complex tasks** that span multiple domains, use department heads instead of direct specialists. Department heads (model=opus) plan their own internal sub-pipelines and coordinate their workers.
+
+**Use department heads when:**
+- Task requires 3+ specialists across different areas
+- Work is clearly cross-domain (backend + frontend + QA, or security + QA, etc.)
+- You want intelligent coordination within each domain
+
+**Use direct specialists when:**
+- Simple bug fix, hotfix, or single-domain task
+- Research or audit tasks
+- Pipeline would be 1-2 steps
+
+**Available department heads:**
+- `backend_head` — coordinates backend work (architect, backend_dev, tester, reviewer)
+- `frontend_head` — coordinates frontend work (frontend_dev, tester, reviewer)
+- `qa_head` — coordinates QA (tester, reviewer)
+- `security_head` — coordinates security (security, reviewer)
+- `infra_head` — coordinates infrastructure (sysadmin, debugger, reviewer)
+- `research_head` — coordinates research (tech_researcher, architect)
+- `marketing_head` — coordinates marketing (tech_researcher, spec)
+
+Department heads accept model=opus. Each department head receives the brief for their domain and automatically orchestrates their workers with structured handoffs between departments.
+
 ## Project type routing

 **If project_type == "operations":**
--- a/agents/runner.py
+++ b/agents/runner.py
@ -27,6 +27,14 @@ _EXTRA_PATH_DIRS = [
    "/usr/local/sbin",
 ]

+# Default timeouts per model (seconds). Override globally with KIN_AGENT_TIMEOUT
+# or per role via timeout_seconds in specialists.yaml.
+_MODEL_TIMEOUTS = {
+    "opus": 1800,    # 30 min
+    "sonnet": 1200,  # 20 min
+    "haiku": 600,    # 10 min
+}
+

 def _build_claude_env() -> dict:
    """Return an env dict with an extended PATH that includes common CLI tool locations.
@ -182,10 +190,22 @@ def run_agent(
        if project_path.is_dir():
            working_dir = str(project_path)

+    # Determine timeout: role-specific (specialists.yaml) > model-based > default
+    role_timeout = None
+    try:
+        from core.context_builder import _load_specialists
+        specs = _load_specialists().get("specialists", {})
+        role_spec = specs.get(role, {})
+        if role_spec.get("timeout_seconds"):
+            role_timeout = int(role_spec["timeout_seconds"])
+    except Exception:
+        pass
+
    # Run claude subprocess
    start = time.monotonic()
    result = _run_claude(prompt, model=model, working_dir=working_dir,
-                         allow_write=allow_write, noninteractive=noninteractive)
+                         allow_write=allow_write, noninteractive=noninteractive,
+                         timeout=role_timeout)
    duration = int(time.monotonic() - start)

    # Parse output — ensure output_text is always a string for DB storage
@ -247,7 +267,11 @@ def _run_claude(

    is_noninteractive = noninteractive or os.environ.get("KIN_NONINTERACTIVE") == "1"
    if timeout is None:
-        timeout = int(os.environ.get("KIN_AGENT_TIMEOUT") or 600)
+        env_timeout = os.environ.get("KIN_AGENT_TIMEOUT")
+        if env_timeout:
+            timeout = int(env_timeout)
+        else:
+            timeout = _MODEL_TIMEOUTS.get(model, _MODEL_TIMEOUTS["sonnet"])
    env = _build_claude_env()

    try:
@ -961,6 +985,187 @@ def _run_learning_extraction(
    return {"added": added, "skipped": skipped}


+# ---------------------------------------------------------------------------
+# Department head detection
+# ---------------------------------------------------------------------------
+
+# Cache of roles with execution_type=department_head from specialists.yaml
+_DEPT_HEAD_ROLES: set[str] | None = None
+
+
+def _is_department_head(role: str) -> bool:
+    """Check if a role is a department head.
+
+    Uses execution_type from specialists.yaml as primary check,
+    falls back to role.endswith('_head') convention.
+    """
+    global _DEPT_HEAD_ROLES
+    if _DEPT_HEAD_ROLES is None:
+        try:
+            from core.context_builder import _load_specialists
+            specs = _load_specialists()
+            all_specs = specs.get("specialists", {})
+            _DEPT_HEAD_ROLES = {
+                name for name, spec in all_specs.items()
+                if spec.get("execution_type") == "department_head"
+            }
+        except Exception:
+            _DEPT_HEAD_ROLES = set()
+    return role in _DEPT_HEAD_ROLES or role.endswith("_head")
+
+
+# ---------------------------------------------------------------------------
+# Department head sub-pipeline execution
+# ---------------------------------------------------------------------------
+
+def _execute_department_head_step(
+    conn: sqlite3.Connection,
+    task_id: str,
+    project_id: str,
+    parent_pipeline_id: int | None,
+    step: dict,
+    dept_head_result: dict,
+    allow_write: bool = False,
+    noninteractive: bool = False,
+    next_department: str | None = None,
+) -> dict:
+    """Execute sub-pipeline planned by a department head.
+
+    Parses the dept head's JSON output, validates the sub_pipeline,
+    creates a child pipeline in DB, runs it, and saves a handoff record.
+
+    Returns dict with success, output, cost_usd, tokens_used, duration_seconds.
+    """
+    raw = dept_head_result.get("raw_output") or dept_head_result.get("output") or ""
+    if isinstance(raw, (dict, list)):
+        raw = json.dumps(raw, ensure_ascii=False)
+
+    parsed = _try_parse_json(raw)
+    if not isinstance(parsed, dict):
+        return {
+            "success": False,
+            "output": "Department head returned non-JSON output",
+            "cost_usd": 0, "tokens_used": 0, "duration_seconds": 0,
+        }
+
+    # Blocked status from dept head
+    if parsed.get("status") == "blocked":
+        reason = parsed.get("blocked_reason", "Department head reported blocked")
+        return {
+            "success": False,
+            "output": json.dumps(parsed, ensure_ascii=False),
+            "blocked": True,
+            "blocked_reason": reason,
+            "cost_usd": 0, "tokens_used": 0, "duration_seconds": 0,
+        }
+
+    sub_pipeline = parsed.get("sub_pipeline", [])
+    if not isinstance(sub_pipeline, list) or not sub_pipeline:
+        return {
+            "success": False,
+            "output": "Department head returned empty or invalid sub_pipeline",
+            "cost_usd": 0, "tokens_used": 0, "duration_seconds": 0,
+        }
+
+    # Recursion guard: no department head roles allowed in sub_pipeline
+    for sub_step in sub_pipeline:
+        if isinstance(sub_step, dict) and _is_department_head(str(sub_step.get("role", ""))):
+            return {
+                "success": False,
+                "output": f"Recursion blocked: sub_pipeline contains _head role '{sub_step['role']}'",
+                "cost_usd": 0, "tokens_used": 0, "duration_seconds": 0,
+            }
+
+    role = step["role"]
+    dept_name = role.replace("_head", "")
+
+    # Create child pipeline in DB
+    child_pipeline = models.create_pipeline(
+        conn, task_id, project_id,
+        route_type="dept_sub",
+        steps=sub_pipeline,
+        parent_pipeline_id=parent_pipeline_id,
+        department=dept_name,
+    )
+
+    # Build initial context for workers: dept head's plan + artifacts
+    dept_plan_context = json.dumps({
+        "department_head_plan": {
+            "department": dept_name,
+            "artifacts": parsed.get("artifacts", {}),
+            "handoff_notes": parsed.get("handoff_notes", ""),
+        },
+    }, ensure_ascii=False)
+
+    # Run the sub-pipeline (noninteractive=True — Opus already reviewed the plan)
+    sub_result = run_pipeline(
+        conn, task_id, sub_pipeline,
+        dry_run=False,
+        allow_write=allow_write,
+        noninteractive=True,
+        initial_previous_output=dept_plan_context,
+    )
+
+    # Extract decisions from sub-pipeline results for handoff
+    decisions_made = []
+    sub_results = sub_result.get("results", [])
+    for sr in sub_results:
+        output = sr.get("output") or sr.get("raw_output") or ""
+        if isinstance(output, str):
+            try:
+                output = json.loads(output)
+            except (json.JSONDecodeError, ValueError):
+                pass
+        if isinstance(output, dict):
+            # Reviewer/tester may include decisions or findings
+            for key in ("decisions", "findings", "recommendations"):
+                val = output.get(key)
+                if isinstance(val, list):
+                    decisions_made.extend(val)
+                elif isinstance(val, str) and val:
+                    decisions_made.append(val)
+
+    # Determine last worker role for auto_complete tracking
+    last_sub_role = sub_pipeline[-1].get("role", "") if sub_pipeline else ""
+
+    # Save handoff for inter-department context
+    handoff_status = "done" if sub_result.get("success") else "partial"
+    try:
+        models.create_handoff(
+            conn,
+            pipeline_id=parent_pipeline_id or child_pipeline["id"],
+            task_id=task_id,
+            from_department=dept_name,
+            to_department=next_department,
+            artifacts=parsed.get("artifacts", {}),
+            decisions_made=decisions_made,
+            blockers=[],
+            status=handoff_status,
+        )
+    except Exception:
+        pass  # Handoff save errors must never block pipeline
+
+    # Build summary output for the next pipeline step
+    summary = {
+        "from_department": dept_name,
+        "handoff_notes": parsed.get("handoff_notes", ""),
+        "artifacts": parsed.get("artifacts", {}),
+        "sub_pipeline_summary": {
+            "steps_completed": sub_result.get("steps_completed", 0),
+            "success": sub_result.get("success", False),
+        },
+    }
+
+    return {
+        "success": sub_result.get("success", False),
+        "output": json.dumps(summary, ensure_ascii=False),
+        "cost_usd": sub_result.get("total_cost_usd", 0),
+        "tokens_used": sub_result.get("total_tokens", 0),
+        "duration_seconds": sub_result.get("total_duration_seconds", 0),
+        "last_sub_role": last_sub_role,
+    }
+
+
 # ---------------------------------------------------------------------------
 # Pipeline executor
 # ---------------------------------------------------------------------------
@ -972,6 +1177,7 @@ def run_pipeline(
    dry_run: bool = False,
    allow_write: bool = False,
    noninteractive: bool = False,
+    initial_previous_output: str | None = None,
 ) -> dict:
    """Execute a multi-step pipeline of agents.

@ -980,6 +1186,9 @@ def run_pipeline(
        {"role": "tester", "depends_on": "debugger", "brief": "..."},
    ]

+    initial_previous_output: context injected as previous_output for the first step
+    (used by dept head sub-pipelines to pass artifacts/plan to workers).
+
    Returns {success, steps_completed, total_cost, total_tokens, total_duration, results}
    """
    # Auth check — skip for dry_run (dry_run never calls claude CLI)
@ -1020,7 +1229,8 @@ def run_pipeline(
    total_cost = 0.0
    total_tokens = 0
    total_duration = 0
-    previous_output = None
+    previous_output = initial_previous_output
+    _last_sub_role = None  # Track last worker role from dept sub-pipelines (for auto_complete)

    for i, step in enumerate(steps):
        role = step["role"]
@ -1283,6 +1493,62 @@ def run_pipeline(
            except Exception:
                pass  # Never block pipeline on decomposer save errors

+        # Department head: execute sub-pipeline planned by the dept head
+        if _is_department_head(role) and result["success"] and not dry_run:
+            # Determine next department for handoff routing
+            _next_dept = None
+            if i + 1 < len(steps):
+                _next_role = steps[i + 1].get("role", "")
+                if _is_department_head(_next_role):
+                    _next_dept = _next_role.replace("_head", "")
+            dept_result = _execute_department_head_step(
+                conn, task_id, project_id,
+                parent_pipeline_id=pipeline["id"] if pipeline else None,
+                step=step,
+                dept_head_result=result,
+                allow_write=allow_write,
+                noninteractive=noninteractive,
+                next_department=_next_dept,
+            )
+            # Accumulate sub-pipeline costs
+            total_cost += dept_result.get("cost_usd") or 0
+            total_tokens += dept_result.get("tokens_used") or 0
+            total_duration += dept_result.get("duration_seconds") or 0
+
+            if not dept_result.get("success"):
+                # Sub-pipeline failed — handle as blocked
+                results.append({"role": role, "_dept_sub": True, **dept_result})
+                if pipeline:
+                    models.update_pipeline(
+                        conn, pipeline["id"],
+                        status="failed",
+                        total_cost_usd=total_cost,
+                        total_tokens=total_tokens,
+                        total_duration_seconds=total_duration,
+                    )
+                error_msg = f"Department {role} sub-pipeline failed"
+                models.update_task(conn, task_id, status="blocked", blocked_reason=error_msg)
+                return {
+                    "success": False,
+                    "error": error_msg,
+                    "steps_completed": i,
+                    "results": results,
+                    "total_cost_usd": total_cost,
+                    "total_tokens": total_tokens,
+                    "total_duration_seconds": total_duration,
+                    "pipeline_id": pipeline["id"] if pipeline else None,
+                }
+
+            # Track last worker role from sub-pipeline for auto_complete eligibility
+            if dept_result.get("last_sub_role"):
+                _last_sub_role = dept_result["last_sub_role"]
+
+            # Override previous_output with dept handoff summary (not raw dept head JSON)
+            previous_output = dept_result.get("output")
+            if isinstance(previous_output, (dict, list)):
+                previous_output = json.dumps(previous_output, ensure_ascii=False)
+            continue
+
        # Project-level auto-test: run `make test` after backend_dev/frontend_dev steps.
        # Enabled per project via auto_test_enabled flag (opt-in).
        # On failure, loop fixer up to KIN_AUTO_TEST_MAX_ATTEMPTS times, then block.
@ -1433,7 +1699,9 @@ def run_pipeline(
                changed_files = _get_changed_files(str(p_path))

        last_role = steps[-1].get("role", "") if steps else ""
-        auto_eligible = last_role in {"tester", "reviewer"}
+        # For dept pipelines: if last step is a _head, check the last worker in its sub-pipeline
+        effective_last_role = _last_sub_role if (_is_department_head(last_role) and _last_sub_role) else last_role
+        auto_eligible = effective_last_role in {"tester", "reviewer"}

        # Guard: re-fetch current status — user may have manually changed it while pipeline ran
        current_task = models.get_task(conn, task_id)
--- a/agents/specialists.yaml
+++ b/agents/specialists.yaml
@ -151,6 +151,126 @@ specialists:
    output_schema:
      tasks: "array of { title, brief, priority, category, acceptance_criteria }"

+  # Department heads — Opus-level coordinators that plan work within their department
+  # and spawn internal sub-pipelines of Sonnet workers.
+
+  backend_head:
+    name: "Backend Department Head"
+    model: opus
+    execution_type: department_head
+    department: backend
+    tools: [Read, Grep, Glob]
+    description: "Plans backend work, coordinates architect/backend_dev/tester within backend department"
+    permissions: read_only
+    context_rules:
+      decisions: all
+      modules: all
+
+  frontend_head:
+    name: "Frontend Department Head"
+    model: opus
+    execution_type: department_head
+    department: frontend
+    tools: [Read, Grep, Glob]
+    description: "Plans frontend work, coordinates frontend_dev/tester within frontend department"
+    permissions: read_only
+    context_rules:
+      decisions: all
+      modules: all
+
+  qa_head:
+    name: "QA Department Head"
+    model: opus
+    execution_type: department_head
+    department: qa
+    tools: [Read, Grep, Glob]
+    description: "Plans QA work, coordinates tester/reviewer within QA department"
+    permissions: read_only
+    context_rules:
+      decisions: all
+
+  security_head:
+    name: "Security Department Head"
+    model: opus
+    execution_type: department_head
+    department: security
+    tools: [Read, Grep, Glob]
+    description: "Plans security work, coordinates security engineer within security department"
+    permissions: read_only
+    context_rules:
+      decisions_category: security
+
+  infra_head:
+    name: "Infrastructure Department Head"
+    model: opus
+    execution_type: department_head
+    department: infra
+    tools: [Read, Grep, Glob]
+    description: "Plans infrastructure work, coordinates sysadmin/debugger within infra department"
+    permissions: read_only
+    context_rules:
+      decisions: all
+
+  research_head:
+    name: "Research Department Head"
+    model: opus
+    execution_type: department_head
+    department: research
+    tools: [Read, Grep, Glob]
+    description: "Plans research work, coordinates tech_researcher/architect within research department"
+    permissions: read_only
+    context_rules:
+      decisions: all
+
+  marketing_head:
+    name: "Marketing Department Head"
+    model: opus
+    execution_type: department_head
+    department: marketing
+    tools: [Read, Grep, Glob]
+    description: "Plans marketing work, coordinates tech_researcher/spec within marketing department"
+    permissions: read_only
+    context_rules:
+      decisions: all
+      modules: all
+
+# Departments — PM uses these when routing complex cross-domain tasks to department heads
+departments:
+  backend:
+    head: backend_head
+    workers: [architect, backend_dev, tester, reviewer]
+    description: "Backend development: API, database, business logic"
+
+  frontend:
+    head: frontend_head
+    workers: [frontend_dev, tester, reviewer]
+    description: "Frontend development: Vue, CSS, components, composables"
+
+  qa:
+    head: qa_head
+    workers: [tester, reviewer]
+    description: "Quality assurance: testing and code review"
+
+  security:
+    head: security_head
+    workers: [security, reviewer]
+    description: "Security: OWASP audit, vulnerability analysis, remediation"
+
+  infra:
+    head: infra_head
+    workers: [sysadmin, debugger, reviewer]
+    description: "Infrastructure: DevOps, deployment, server management"
+
+  research:
+    head: research_head
+    workers: [tech_researcher, architect]
+    description: "Technical research and architecture planning"
+
+  marketing:
+    head: marketing_head
+    workers: [tech_researcher, spec]
+    description: "Marketing: market research, positioning, content strategy, SEO"
+
 # Route templates — PM uses these to build pipelines
 routes:
  debug:
@ -188,3 +308,27 @@ routes:
  spec_driven:
    steps: [constitution, spec, architect, task_decomposer]
    description: "Constitution → spec → implementation plan → decompose into tasks"
+
+  dept_feature:
+    steps: [backend_head, frontend_head, qa_head]
+    description: "Full-stack feature: backend dept → frontend dept → QA dept"
+
+  dept_fullstack:
+    steps: [backend_head, frontend_head]
+    description: "Full-stack feature without dedicated QA pass"
+
+  dept_security_audit:
+    steps: [security_head, qa_head]
+    description: "Security audit followed by QA verification"
+
+  dept_backend:
+    steps: [backend_head]
+    description: "Backend-only task routed through department head"
+
+  dept_frontend:
+    steps: [frontend_head]
+    description: "Frontend-only task routed through department head"
+
+  dept_marketing:
+    steps: [marketing_head]
+    description: "Marketing task routed through department head"