Generic Harness Integration Guide for Babysitter SDK

A step-by-step implementation guide for integrating the babysitter SDK orchestration loop into any AI coding harness. This document is harness-agnostic and uses pseudocode throughout. For the canonical reference implementation, see Claude Code Integration.

Prerequisites
Core Integration Points
Harness Capability Matrix
Session State Contract
Hook Equivalence Table
CLI Output Schemas
CLI Error Handling
Edge Cases
Testing the Integration
Reference Implementation

1. Prerequisites

Your harness must provide (or be able to emulate) the following capabilities before you begin integration. Each item is marked as REQUIRED or RECOMMENDED.

Checklist

REQUIRED: Shell or script execution -- The harness must be able to execute shell commands (bash, sh, cmd) or invoke Node.js scripts. The babysitter CLI is a Node.js binary invoked via shell. Every integration point depends on running babysitter <command> and reading its JSON output from stdout.
REQUIRED: Exit/stop interception -- The harness must provide a mechanism to intercept the AI agent's attempt to end its turn or exit the conversation. This is the single most critical requirement. Without it, the orchestration loop cannot function. Examples:
- A "stop hook" that fires before the agent's response is finalized
- A middleware layer that can reject an exit signal and re-inject context
- A wrapper around the agent loop that checks a condition before allowing termination
REQUIRED: Context re-injection -- After blocking an exit, the harness must be able to inject new text (a system message, user message, or tool result) into the agent's context so it continues working. The injected content comes from the babysitter CLI output.
REQUIRED: Session/conversation identity -- The harness must provide a stable identifier for the current session or conversation. This ID is used to:
- Name the session state file
- Associate the session with a babysitter run
- Track iteration count across stop-hook cycles
RECOMMENDED: Lifecycle hooks -- Pre-session, post-session, pre-turn, post-turn hooks simplify integration. If unavailable, equivalent behavior can be built by wrapping the agent's main loop.
RECOMMENDED: Transcript access -- Access to the agent's recent output text enables completion proof verification (scanning for <promise> tags). If unavailable, an alternative proof mechanism must be implemented (see Section 2d).
RECOMMENDED: Persistent environment -- Environment variables or a key-value store that persists across hook invocations within the same session. Used to carry the session ID and plugin root path.

Minimum Environment

Node.js >= 18
npm >= 8
File system access (read/write to working directory)

2. Core Integration Points

Implement these in order. Each section includes a checklist, pseudocode, and the specific CLI commands involved.

+-----------------------------------------------------------------------+
|                         YOUR HARNESS                                  |
|                                                                       |
|  +------------------+  +-----------------------+  +----------------+  |
|  | Session Lifecycle |  | Exit/Stop Interceptor |  | Agent Loop     |  |
|  | (start/end)      |  | (block or approve)    |  | (LLM turns)   |  |
|  +--------+---------+  +-----------+-----------+  +-------+--------+  |
|           |                        |                      |           |
+-----------------------------------------------------------------------+
            |                        |                      |
            v                        v                      v
   +-------------------+   +-------------------+   +------------------+
   | babysitter CLI    |   | Session State     |   | Run Directory    |
   | (npm package)     |   | {stateDir}/       |   | .a5c/runs/{id}/  |
   |                   |   |   {sessionId}.md  |   |   journal/       |
   | session:init      |   +-------------------+   |   tasks/         |
   | run:create        |                           |   state/         |
   | session:associate |                           +------------------+
   | run:iterate       |
   | task:list         |
   | task:post         |
   | session:check-    |
   |   iteration       |
   | session:iteration-|
   |   message         |
   | run:status        |
   +-------------------+

2a. SDK Installation

Goal: Ensure the babysitter CLI binary is available on PATH.

Checklist

Determine the SDK version to install (from plugin manifest or pinned version)
Attempt global install; fall back to local prefix install; fall back to npx
Gate installation behind a marker file to avoid repeated install attempts
Verify the CLI is callable: babysitter version --json

Pseudocode

function ensureBabysitterCLI(sdkVersion):
    markerFile = "{pluginRoot}/.babysitter-install-attempted"

    if commandExists("babysitter"):
        return "babysitter"

    if fileExists(markerFile):
        // Already tried installing; fall through to npx
    else:
        // Attempt global install
        result = shell("npm install -g @a5c-ai/babysitter-sdk@{sdkVersion}")
        if result.exitCode != 0:
            // Fallback: install with local prefix
            result = shell("npm install -g @a5c-ai/babysitter-sdk@{sdkVersion} --prefix $HOME/.local")
        writeFile(markerFile, "attempted")

    if commandExists("babysitter"):
        return "babysitter"

    // Final fallback: use npx on every invocation
    return "npx -y @a5c-ai/babysitter-sdk@{sdkVersion} babysitter"

CLI Command

# Verify installation
babysitter version --json
# Expected: { "version": "x.y.z", "sdkVersion": "..." }

2b. Session Initialization

Goal: Create a baseline session state file so the orchestration loop can track iterations from the very start of the session, even before any run is created.

When to Call

At session/conversation start -- before the user has issued any commands. This is typically wired into a "session start" lifecycle hook or called at the top of the agent's main loop.

Checklist

Obtain or generate a unique session ID
Determine the state directory (typically {pluginRoot}/skills/babysit/state/)
Call babysitter session:init
Persist the session ID in the harness environment for later hook invocations

Pseudocode

function onSessionStart(sessionId, pluginRoot):
    stateDir = "{pluginRoot}/skills/babysit/state"
    ensureDirectoryExists(stateDir)

    result = shell(
        "babysitter session:init" +
        " --session-id {sessionId}" +
        " --state-dir {stateDir}" +
        " --json"
    )

    if result.exitCode != 0:
        log("WARNING: session init failed, orchestration may not work")
        return

    // Persist session ID for use by the stop interceptor
    setEnv("AGENT_SESSION_ID", sessionId)
    setEnv("BABYSITTER_PLUGIN_ROOT", pluginRoot)

What This Creates

A session state file at {stateDir}/{sessionId}.md in BASELINE state (empty run_id, iteration: 1). See Section 4: Session State Contract for the full file format, field definitions, and state transition diagram.

2c. Run Creation and Session Binding

Goal: Create a babysitter run and bind it to the current session so the stop interceptor knows which run to check.

When to Call

After the user requests a task that should be orchestrated. Typically triggered by a skill or command within the harness (e.g., the user says "babysit this task").

Checklist

Prepare the process definition (entry point, process ID, inputs)
Call babysitter run:create with harness and session parameters
Call babysitter session:associate to bind the run to the session
Verify the session state file now has a non-empty run_id

Pseudocode

function createAndBindRun(processId, entryPoint, inputs, prompt, sessionId, pluginRoot):
    // Step 1: Create the run
    createResult = shell(
        "babysitter run:create" +
        " --process-id {processId}" +
        " --entry {entryPoint}" +
        " --inputs {inputsFilePath}" +
        " --prompt \"{prompt}\"" +
        " --json"
    )
    runId = parseJson(createResult.stdout).runId
    runDir = ".a5c/runs/{runId}"

    // Step 2: Bind session to run
    shell(
        "babysitter session:associate" +
        " --session-id {sessionId}" +
        " --run-id {runId}" +
        " --state-dir {pluginRoot}/skills/babysit/state" +
        " --json"
    )

    return { runId, runDir }

Re-entrant Run Prevention

If the session is already bound to a different run, session:associate will fail. The harness must either:

Complete or clean up the existing run first
Remove the old session state file manually
Present an error to the user

2d. The Orchestration Loop Driver

Goal: Convert the agent's single-turn execution into a multi-iteration orchestration loop by intercepting exit signals, checking run status, and re-injecting context.

This is the most critical and complex integration point.

Architecture

Agent executes turn
     |
     v
Agent signals "done" (stop/exit)
     |
     v
+--[EXIT INTERCEPTOR]----------------------------------------------+
|  1. Read session state file                                      |
|  2. Check guards (max iterations, runaway detect, no run bound)  |
|  3. Load run status via run:status                               |
|  4. Check completion proof                                       |
|  5. Decision: APPROVE or BLOCK                                   |
+--------+-------------------+-------------------------------------+
         |                   |
    [APPROVE]           [BLOCK]
         |                   |
         v                   v
    Session ends     Re-inject context ------> Agent continues
                     (iteration message)       (back to top)

The Decision Algorithm

function onAgentStop(sessionId, pluginRoot, runsDir, lastAgentOutput):
    stateDir = "{pluginRoot}/skills/babysit/state"
    stateFile = "{stateDir}/{sessionId}.md"

    // --- Guard 1: No state file means no active loop ---
    if not fileExists(stateFile):
        return APPROVE

    state = parseSessionState(stateFile)

    // --- Guard 2: Max iterations ---
    if state.iteration >= state.maxIterations:
        cleanupSessionFile(stateFile)
        return APPROVE

    // --- Guard 3: Runaway loop detection ---
    if state.iteration >= 5:
        avgTime = average(state.iterationTimes)  // last 3 durations
        if avgTime <= 15:  // seconds
            cleanupSessionFile(stateFile)
            return APPROVE

    // --- Guard 4: No run bound ---
    if state.runId == "":
        cleanupSessionFile(stateFile)
        return APPROVE

    // --- Check run status ---
    statusResult = shell(
        "babysitter run:status .a5c/runs/{state.runId} --json"
    )
    runStatus = parseJson(statusResult.stdout)

    // --- Guard 5: Unknown or unreadable run ---
    if statusResult.exitCode != 0:
        cleanupSessionFile(stateFile)
        return APPROVE

    // --- Guard 6: Completion proof ---
    if runStatus.state == "completed":
        proof = runStatus.completionProof
        promiseTag = extractPromiseTag(lastAgentOutput)
        if promiseTag == proof:
            cleanupSessionFile(stateFile)
            return APPROVE

    // --- BLOCK: Continue the loop ---
    // Advance session state from BOUND/ACTIVE to next iteration.
    // See Section 4 (Session State Contract) for field update rules
    // and the atomic write protocol.
    newIteration = state.iteration + 1
    updateSessionState(stateFile, {
        iteration: newIteration,
        lastIterationAt: now()
    })

    // Build the context message to re-inject
    // NOTE: session:iteration-message uses --iteration, --run-id,
    //       --runs-dir, and --plugin-root (NOT --session-id or --state-dir)
    iterationMessage = shell(
        "babysitter session:iteration-message" +
        " --iteration {newIteration}" +
        " --run-id {state.runId}" +
        " --runs-dir {runsDir}" +
        " --plugin-root {pluginRoot}" +
        " --json"
    )

    return BLOCK {
        reason: parseJson(iterationMessage.stdout).systemMessage,
        systemMessage: "Babysitter iteration {newIteration}/{state.maxIterations}"
    }

Intercepting Exit Signals

The mechanism depends entirely on your harness. Common patterns:

Harness Type	Interception Mechanism
Hook-based (Claude Code, etc.)	Register a `Stop` hook that receives agent output and returns block/approve
Middleware-based	Wrap the agent loop's exit check in a middleware that calls the decision algorithm
Event-based	Listen for "agent_turn_complete" events, cancel and re-queue if BLOCK
Loop-based	Replace the `while (running)` loop condition with the decision algorithm
API-based	Between API calls, run the check and decide whether to make another call

Re-injecting Context

After blocking, the harness must feed the orchestration context back to the agent. The mechanism depends on your harness:

Harness Type	Re-injection Mechanism
System message injection	Append the `reason` as a system message before the next turn
User message simulation	Insert a synthetic user message containing the iteration context
Tool result injection	Return the context as a tool call result
Context window prepend	Prepend the context to the agent's next input

The content to inject comes from the systemMessage field of the session:iteration-message output. It typically contains:

Iteration number and status
What to do next (run:iterate, execute effects, extract proof, etc.)
Pending effect kinds if the run is in "waiting" state

Detecting the Completion Proof

The completion proof is a SHA-256 hash that the agent must output inside <promise>...</promise> tags. The harness must:

Scan the agent's last output for <promise>VALUE</promise>
Compare VALUE against the completionProof from run:status --json
If they match, allow exit

function extractPromiseTag(text):
    match = regex_search(text, "<promise>([\\s\\S]*?)</promise>")
    if match is null:
        return null
    return trim(match.group(1)).replace(/\\s+/, " ")

If the harness cannot access the agent's output text (no transcript), alternative approaches:

Have the agent call a special "complete" tool that the harness intercepts
Use a dedicated CLI command that the agent calls to signal completion
Implement a "completion callback" webhook

2e. Effect Execution

Goal: Execute the pending tasks that the babysitter run has requested, then post their results.

The Effect Execution Cycle

babysitter run:iterate .a5c/runs/{runId} --json
        |
        v
  Returns: { status, pendingActions[], ... }
        |
        v
babysitter task:list .a5c/runs/{runId} --pending --json
        |
        v
  Returns: { tasks: [{ effectId, taskId, kind, status, label, ... }] }
        |
        v
  For each pending task:
        |
        +--[kind = "node"]----------> Execute Node.js script
        |
        +--[kind = "breakpoint"]----> Present to user for approval
        |
        +--[kind = "sleep"]---------> Wait until specified time
        |
        +--[kind = "orchestrator_  -> Delegate to a sub-agent or
        |    task"]                   orchestrator within your harness
        |
        +--[kind = "agent"]---------> Delegate to an agent subprocess
        |
        +--[custom kind]------------> Handle per your harness capabilities
        |
        v
  Post result via task:post (Section 2f)

Effect Result Type

All effect handlers must return a result conforming to this structure (or the sentinel DEFERRED for effects that will be resolved later):

EffectResult = {
    status: "ok" | "error",
    value: object          // Payload specific to the effect kind
}

// For node tasks:
//   { status: "ok", value: <return value of the Node.js function> }
//   { status: "error", value: { message: string, stack?: string } }

// For breakpoints:
//   { status: "ok", value: { approved: boolean, approvedBy?: string, reason?: string } }

// For sleep:
//   { status: "ok", value: { wokeAt: string (ISO 8601), reason: string } }

// For orchestrator_task:
//   { status: "ok", value: { output: any, completedAt: string } }
//   { status: "error", value: { message: string, phase?: string } }

// For agent:
//   { status: "ok", value: { response: string, tokensUsed?: number } }
//   { status: "error", value: { message: string, exitCode?: number } }

Pseudocode

function executeEffects(runId):
    runDir = ".a5c/runs/{runId}"

    // Step 1: Iterate to discover pending effects
    iterResult = shell("babysitter run:iterate {runDir} --json")
    if iterResult.exitCode != 0:
        handleCLIError("run:iterate", iterResult)
        return

    iterData = parseJson(iterResult.stdout)

    if iterData.status == "completed":
        // Run is done -- extract proof and output it
        proof = iterData.completionProof
        agentOutput("<promise>{proof}</promise>")
        return

    if iterData.status == "failed":
        // Inspect error, attempt recovery
        return

    // Step 2: List pending tasks
    listResult = shell("babysitter task:list {runDir} --pending --json")
    if listResult.exitCode != 0:
        handleCLIError("task:list", listResult)
        return

    tasks = parseJson(listResult.stdout).tasks

    // Step 3: Execute each task
    for task in tasks:
        taskDir = "{runDir}/tasks/{task.effectId}"
        taskDef = readJson("{taskDir}/task.json")

        switch task.kind:
            case "node":
                result = executeNodeTask(taskDef)
            case "breakpoint":
                result = handleBreakpoint(taskDef)
            case "sleep":
                result = handleSleep(taskDef)
            case "orchestrator_task":
                result = handleOrchestratorTask(taskDef)
            case "agent":
                result = handleAgentTask(taskDef)
            default:
                result = handleCustomKind(task.kind, taskDef)

        // Step 4: Post result (skip deferred effects like long sleeps)
        if result != DEFERRED:
            postResult(runId, task.effectId, result)

Breakpoint Effect Handler

Breakpoints are human approval gates. The process pauses until a human explicitly approves or rejects the breakpoint. Never auto-approve breakpoints -- they exist specifically to require human judgment.

function handleBreakpoint(taskDef):
    // taskDef.args schema:
    //   {
    //     message?: string,         // Human-readable description of what needs approval
    //     description?: string,     // Alternative to message (checked as fallback)
    //     context?: {
    //       changedFiles?: string[],  // Files modified since last breakpoint
    //       summary?: string,         // Summary of work done so far
    //       risks?: string[],         // Identified risks requiring human review
    //       [key: string]: unknown    // Additional context from the process
    //     },
    //     requireExplicitApproval?: boolean,  // If true, never auto-approve (default: true)
    //     blocking?: boolean          // If true, the run cannot proceed without resolution (default: true)
    //   }

    message = taskDef.args.message or taskDef.args.description or "Approval required"
    context = taskDef.args.context or {}
    requireExplicit = taskDef.args.requireExplicitApproval != false  // default true

    // Present to user via your harness's interactive prompt mechanism
    if harnessSupportsInteractivePrompt():
        // Build a rich prompt with context if available
        promptBody = message
        if context.summary:
            promptBody += "\n\nSummary: " + context.summary
        if context.risks and length(context.risks) > 0:
            promptBody += "\n\nRisks:\n" + join(context.risks, "\n- ")
        if context.changedFiles and length(context.changedFiles) > 0:
            promptBody += "\n\nChanged files:\n" + join(context.changedFiles, "\n- ")

        userDecision = promptUser(
            title: "Babysitter Breakpoint",
            message: promptBody,
            options: ["approve", "reject"]
        )

        if userDecision == "approve":
            return { status: "ok", value: { approved: true, approvedBy: "user" } }
        else:
            return { status: "ok", value: { approved: false, reason: "User rejected" } }

    // Non-interactive fallback: reject with explanation
    // The agent will see this and can inform the user
    return {
        status: "ok",
        value: {
            approved: false,
            reason: "Non-interactive environment; breakpoint requires manual approval"
        }
    }

Sleep Effect Handler

Sleep effects pause execution until a specified time. The harness must decide whether to block (wait inline) or defer (post result later).

function handleSleep(taskDef):
    // taskDef.args schema:
    //   {
    //     until?: string,          // ISO 8601 timestamp to sleep until
    //     sleepUntil?: string,     // Alias for 'until'
    //     durationMs?: number,     // Duration in milliseconds (alternative to until)
    //     reason?: string          // Human-readable reason for the sleep
    //   }
    //
    // Exactly one of (until | sleepUntil) or durationMs should be provided.
    // If both are present, the absolute timestamp (until/sleepUntil) takes precedence.

    sleepUntil = taskDef.args.until or taskDef.args.sleepUntil
    durationMs = taskDef.args.durationMs

    if sleepUntil:
        targetTime = parseISO8601(sleepUntil)
    else if durationMs:
        targetTime = now() + durationMs
    else:
        // No target time specified; resolve immediately
        return { status: "ok", value: { wokeAt: now(), reason: "no_target_time" } }

    remainingMs = targetTime - now()

    if remainingMs <= 0:
        // Sleep time already passed
        return { status: "ok", value: { wokeAt: now(), reason: "already_elapsed" } }

    if remainingMs <= 60000:  // 1 minute or less
        // Short sleep: block inline
        sleep(remainingMs)
        return { status: "ok", value: { wokeAt: now(), reason: "waited" } }

    // Long sleep: post a deferred result
    // Option A: Schedule a timer/cron to post the result later
    scheduleDelayedPost(runId, effectId, targetTime)
    // Do NOT post result now -- let the orchestration loop handle it
    // on the next iteration after the timer fires
    return DEFERRED  // signal to caller: do not post result yet

Orchestrator Task Effect Handler

Orchestrator tasks delegate a sub-process to an orchestrator or sub-agent within your harness. The task definition contains a prompt, optional inputs, and configuration for the sub-process.

function handleOrchestratorTask(taskDef):
    // taskDef.args schema:
    //   {
    //     prompt: string,           // The instruction for the sub-agent
    //     processId?: string,       // Optional sub-process ID to invoke
    //     inputs?: object,          // Inputs to pass to the sub-process
    //     constraints?: {
    //       maxIterations?: number, // Iteration limit for the sub-process
    //       timeout?: number        // Timeout in ms for the sub-process
    //     }
    //   }

    prompt = taskDef.args.prompt
    inputs = taskDef.args.inputs or {}
    constraints = taskDef.args.constraints or {}
    timeout = constraints.timeout or 900000  // default 15 min

    if harnessSupportsSubAgentDelegation():
        // Delegate to a sub-agent or internal orchestrator
        subResult = delegateToSubAgent({
            prompt: prompt,
            inputs: inputs,
            maxIterations: constraints.maxIterations or 50,
            timeout: timeout
        })

        if subResult.success:
            return {
                status: "ok",
                value: {
                    output: subResult.output,
                    completedAt: now()
                }
            }
        else:
            return {
                status: "error",
                value: {
                    message: subResult.error,
                    phase: subResult.failedPhase or "execution"
                }
            }

    // Fallback: execute as a simple prompt-response if no sub-agent support
    // This is a degraded mode -- the harness loses multi-step orchestration
    response = executePromptSingleTurn(prompt, inputs)
    return {
        status: "ok",
        value: {
            output: response,
            completedAt: now()
        }
    }

Agent Effect Handler

Agent effects delegate work to a standalone agent subprocess. Unlike orchestrator_task, the agent effect expects a self-contained agent invocation (typically a CLI tool or API call) that runs to completion.

function handleAgentTask(taskDef):
    // taskDef.args schema:
    //   {
    //     command: string,          // Agent command or prompt
    //     workingDir?: string,      // Working directory for the agent
    //     env?: Record<string, string>,  // Environment variables
    //     timeout?: number,         // Timeout in ms (default: 900000)
    //     captureOutput?: boolean   // Whether to capture stdout/stderr (default: true)
    //   }

    command = taskDef.args.command
    workingDir = taskDef.args.workingDir or getCwd()
    env = taskDef.args.env or {}
    timeout = taskDef.args.timeout or 900000  // default 15 min

    if harnessSupportsAgentSubprocess():
        // Spawn agent subprocess
        agentResult = spawnAgent({
            command: command,
            workingDir: workingDir,
            env: env,
            timeout: timeout
        })

        if agentResult.exitCode == 0:
            return {
                status: "ok",
                value: {
                    response: agentResult.stdout,
                    tokensUsed: agentResult.tokensUsed or null
                }
            }
        else:
            return {
                status: "error",
                value: {
                    message: agentResult.stderr or "Agent exited with code {agentResult.exitCode}",
                    exitCode: agentResult.exitCode
                }
            }

    // Fallback: treat as a shell command
    shellResult = shell(command, { cwd: workingDir, env: env, timeout: timeout })
    if shellResult.exitCode == 0:
        return { status: "ok", value: { response: shellResult.stdout } }
    else:
        return {
            status: "error",
            value: { message: shellResult.stderr, exitCode: shellResult.exitCode }
        }

Reading Task Definitions

Each pending task has a task.json in its effect directory:

.a5c/runs/{runId}/tasks/{effectId}/task.json

The task definition contains the task kind, arguments, labels, and other metadata needed to execute it. Read it with:

babysitter task:show .a5c/runs/{runId} {effectId} --json

2f. Result Posting

Goal: Record effect execution results back into the run journal.

IMPORTANT

Always post results through the CLI. Never write result.json directly. The CLI command handles:

Writing result.json with the correct schema version
Appending an EFFECT_RESOLVED event to the journal
Updating the state cache

Pseudocode

function postResult(runId, effectId, result):
    runDir = ".a5c/runs/{runId}"
    taskDir = "{runDir}/tasks/{effectId}"

    // Write the result value to a temporary file
    valueFile = "{taskDir}/output.json"
    writeJson(valueFile, result.value)

    // Post through the CLI
    shell(
        "babysitter task:post {runDir} {effectId}" +
        " --status {result.status}" +   // "ok" or "error"
        " --value {valueFile}" +
        " --json"
    )

CLI Command

# Success case
babysitter task:post .a5c/runs/{runId} {effectId} \
  --status ok \
  --value tasks/{effectId}/output.json \
  --json

# Error case
babysitter task:post .a5c/runs/{runId} {effectId} \
  --status error \
  --value tasks/{effectId}/error.json \
  --json

Result Status Values

Status	Meaning
`ok`	Task completed successfully; value contains the result
`error`	Task failed; value contains error details

2g. Iteration Guards

Goal: Prevent infinite loops and detect runaway behavior.

CLI Command

babysitter session:check-iteration \
  --session-id {sessionId} \
  --state-dir {stateDir} \
  --json

Output

See Section 6: CLI Output Schemas for the full schema. Summary:

shouldContinue: true -- safe to proceed; nextIteration indicates the next number
shouldContinue: false -- stop the loop; reason explains why (e.g., max_iterations_reached, iteration_too_fast, session_not_found)

Guard Logic

function checkIterationGuards(sessionId, stateDir):
    result = shell(
        "babysitter session:check-iteration" +
        " --session-id {sessionId}" +
        " --state-dir {stateDir}" +
        " --json"
    )
    data = parseJson(result.stdout)

    if not data.found:
        return { shouldContinue: false, reason: "no_session" }

    if not data.shouldContinue:
        return { shouldContinue: false, reason: data.reason }

    return { shouldContinue: true, nextIteration: data.nextIteration }

Two Detection Mechanisms

1. Max Iterations Guard

IF iteration >= maxIterations (default 256):
    STOP -- allow exit, clean up state file

2. Runaway Speed Guard

IF iteration >= 5:
    avgDuration = average(last 3 iteration durations)
    IF avgDuration <= 15 seconds:
        STOP -- iterations are too fast, likely a runaway loop

The iteration duration is measured as the wall-clock time between consecutive stop-hook invocations. Durations below 15 seconds on average (after at least 5 iterations) indicate the agent is not doing meaningful work.

Threshold justifications:

Why iteration >= 5: The first few iterations are often fast because the agent is reading instructions, creating the run, and performing lightweight setup. A minimum of 5 iterations avoids false positives during this bootstrap phase while still catching runaways before significant resource waste. Empirical testing across Claude Code sessions showed that legitimate fast iterations (setup, binding, first iterate) are consistently complete within 3-4 cycles.
Why average <= 15 seconds: A meaningful agent iteration -- one that reads files, calls an LLM, writes code, or executes tests -- typically takes 30-120 seconds. The 15-second threshold provides a 2x safety margin below the minimum expected productive iteration time. Iterations under 15 seconds typically indicate the agent is stuck in a loop where it reads the iteration message, does no substantive work, and immediately signals completion. The 3-iteration rolling average (rather than a single iteration) smooths out one-off fast iterations caused by cached replay or quick task:post calls.
Tuning: Both thresholds can be adjusted for specific harness environments. If your agent performs very lightweight iterations (e.g., posting pre-computed results), lower the speed threshold. If your setup phase is longer, raise the minimum iteration count. The session:check-iteration CLI command applies these same thresholds internally.

3. Harness Capability Matrix

Required vs Optional Capabilities

Capability	Required	Purpose
Shell command execution	YES	All CLI interactions
Exit/stop interception	YES	Core loop driver
Context re-injection	YES	Continue agent after BLOCK
Session identity	YES	State file naming, run binding
File system read/write	YES	State files, task artifacts
Transcript access	NO *	Completion proof via `<promise>` tag
Lifecycle hooks	NO	Simplifies wiring; can be emulated
Persistent environment	NO	Convenience; can pass via files
Interactive user prompts	NO	Breakpoint handling (non-interactive mode is fallback)
Sub-agent delegation	NO	orchestrator_task / agent effects

* If transcript access is unavailable, an alternative completion signaling mechanism must be implemented.

Integration Tiers

Tier 1: Minimum Viable Integration

Supports basic orchestration with node tasks and completion detection.

Tier 2: Robust Integration

Adds safety guards and breakpoint support.

Everything in Tier 1
Runaway loop detection (iteration speed guard)
session:check-iteration calls
Interactive breakpoint handling
Sleep effect handling
Journal event recording for debugging

Tier 3: Full Integration

Complete feature parity with the Claude Code reference implementation.

Everything in Tier 2
Native lifecycle hooks (on-run-start, on-task-complete, etc.)
Hook discovery (per-repo, per-user, plugin directories)
Orchestrator task delegation
Agent task delegation
Quality scoring via on-score hooks
Skill discovery and injection
Non-interactive breakpoint auto-resolution

4. Session State Contract

File Format

Session state files use Markdown with YAML frontmatter. The frontmatter stores machine-readable state. The Markdown body stores the user's original prompt.

Path convention: {stateDir}/{sessionId}.md

Example

---
active: true
iteration: 3
max_iterations: 256
run_id: "my-run-abc123"
started_at: "2026-03-02T10:00:00Z"
last_iteration_at: "2026-03-02T10:05:30Z"
iteration_times: 45,62,58
---

Build a REST API with authentication and rate limiting for the user service.

Required Fields

Field	Type	Default	Description
`active`	boolean	`true`	Whether the orchestration loop is active
`iteration`	number	`1`	Current iteration (1-based)
`max_iterations`	number	`256`	Maximum iterations (0 = unlimited)
`run_id`	string	`""`	Bound run ID (empty before run:create)
`started_at`	string (ISO 8601)	now	Session start timestamp
`last_iteration_at`	string (ISO 8601)	now	Last iteration timestamp
`iteration_times`	string (CSV)	(empty)	Last 3 iteration durations in seconds

State Transitions

    CREATE                BIND               ITERATE (x N)        COMPLETE
  (session:init)    (session:associate)    (stop hook BLOCK)    (stop hook APPROVE)
       |                   |                     |                    |
       v                   v                     v                    v
  +-----------+     +-----------+          +-----------+       +------------+
  |  BASELINE |     |   BOUND   |          |  ACTIVE   |       |  CLEANED   |
  |           |---->|           |--------->|           |------>|   UP       |
  | runId=""  |     | runId=X   |          | iter=N+1  |       | file       |
  | iter=1    |     | iter=1    |          | times=[.] |       | deleted    |
  +-----------+     +-----------+          +-----------+       +------------+

Atomic Write Protocol

Session state files must be written atomically to prevent corruption from concurrent reads during stop-hook evaluation:

Write content to temp file: {filePath}.tmp.{pid}
Atomic rename: rename(tempFile, targetFile)
On error: delete temp file

Timing Calculation

function updateIterationTimes(existingTimes, lastIterationAt, currentTime):
    durationSeconds = (currentTime - lastIterationAt) / 1000
    if durationSeconds <= 0:
        return existingTimes
    newTimes = append(existingTimes, durationSeconds)
    return lastN(newTimes, 3)   // keep only last 3

5. Hook Equivalence Table

The babysitter SDK and harness integration involve two categories of hooks:

SDK Hooks (13 `KnownHookType` values)

These are dispatched by the SDK runtime during orchestration. They are defined in packages/sdk/src/hooks/types.ts and fired via callHook(hookType, payload).

SDK Hook	Tier	Description
`on-run-start`	3	Fires after `run:create` completes
`on-run-complete`	3	Fires when `run:iterate` returns status=completed
`on-run-fail`	3	Fires when `run:iterate` returns status=failed
`on-task-start`	3	Fires before executing each pending effect
`on-task-complete`	3	Fires after `task:post` completes
`on-step-dispatch`	3	Fires when `run:iterate` discovers a new effect
`on-iteration-start`	2	Fires before calling `run:iterate`
`on-iteration-end`	2	Fires after all effects for an iteration are posted
`on-breakpoint`	2	Fires when a breakpoint effect is pending; present to user for approval
`on-score`	3	Fires when a quality score is posted to the run
`pre-commit`	3	Fires before the agent creates a git commit
`pre-branch`	3	Fires before the agent creates a new git branch
`post-planning`	3	Fires after the planning phase produces output

Harness-Level Concepts (not SDK KnownHookType values)

These are integration points that your harness must implement. They are NOT SDK hook types -- they are harness-specific lifecycle events that drive the orchestration loop.

Harness Concept	Tier	Generic Equivalent
session-start	1	Session/conversation start callback. Call `session:init` to create the baseline state file. This maps to your harness's "on conversation begin" event.
stop (exit interceptor)	1	Exit/turn-end interceptor. Run the decision algorithm (Section 2d) to BLOCK or APPROVE the agent's exit attempt. This is the core loop driver.
session-end	1	Session cleanup. Delete the session state file when the conversation ends normally.

Hook Discovery Directories

If implementing Tier 3, hook scripts are searched in this priority order:

Per-repo:   {REPO_ROOT}/.a5c/hooks/{hookType}/*.sh     (highest)
Per-user:   ~/.config/babysitter/hooks/{hookType}/*.sh  (medium)
Plugin:     {PLUGIN_ROOT}/hooks/{hookType}/*.sh         (lowest)

Scripts within each directory are sorted alphabetically and executed sequentially.

6. CLI Output Schemas

This section documents the JSON output schemas for the most frequently used CLI commands. All examples assume --json is passed.

`run:status` Output

{
  "state": "waiting",
  "lastEvent": {
    "type": "EFFECT_REQUESTED",
    "recordedAt": "2026-03-02T10:05:00Z",
    "data": { "..." : "..." }
  },
  "pendingByKind": {
    "node": 2,
    "breakpoint": 1
  },
  "pendingEffectsSummary": {
    "totalPending": 3,
    "countsByKind": { "node": 2, "breakpoint": 1 },
    "autoRunnableCount": 2
  },
  "needsMoreIterations": true,
  "metadata": null,
  "completionProof": null
}

Field	Type	Description
`state`	`"created" \| "waiting" \| "completed" \| "failed"`	Derived run lifecycle state
`lastEvent`	object or null	The most recent journal event (serialized)
`pendingByKind`	`Record<string, number>`	Count of pending effects grouped by kind
`pendingEffectsSummary.totalPending`	number	Total pending effects
`pendingEffectsSummary.autoRunnableCount`	number	Effects that can be auto-executed (kind=node)
`needsMoreIterations`	boolean	True if state=waiting and autoRunnableCount > 0
`completionProof`	string or null	SHA-256 proof hash (only when state=completed)

`session:check-iteration` Output

The output always includes found, shouldContinue, iteration, maxIterations, runId, and prompt. When shouldContinue is false, reason and stopMessage explain why.

// shouldContinue: true
{ "found": true, "shouldContinue": true, "nextIteration": 4,
  "updatedIterationTimes": [45, 62, 58], "iteration": 3,
  "maxIterations": 256, "runId": "my-run-abc123", "prompt": "Build the API..." }

// shouldContinue: false -- possible reason values:
//   "iteration_too_fast"     (+ averageTime, threshold, stopMessage)
//   "max_iterations_reached" (+ stopMessage)
//   "session_not_found"      (found=false, all counters zero)

`reason` value	Trigger condition	Extra fields
`iteration_too_fast`	iteration >= 5 and avg(last 3) <= 15s	`averageTime`, `threshold`
`max_iterations_reached`	iteration >= maxIterations	--
`session_not_found`	State file does not exist	`found: false`

`task:list` Output

{
  "tasks": [
    {
      "effectId": "E001-abc",
      "taskId": "greet",
      "stepId": "S000001",
      "status": "pending",
      "kind": "node",
      "label": "Greet user",
      "labels": ["greeting"],
      "taskDefRef": "tasks/E001-abc/task.json",
      "inputsRef": null,
      "resultRef": null,
      "stdoutRef": null,
      "stderrRef": null,
      "requestedAt": "2026-03-02T10:01:00Z",
      "resolvedAt": null
    }
  ]
}

Field	Type	Description
`effectId`	string	Unique effect identifier
`taskId`	string	Task type identifier (from `defineTask`)
`stepId`	string	Sequential step ID (e.g., `S000001`)
`status`	`"pending" \| "resolved" \| "unknown"`	Current effect status
`kind`	string	Task kind: `node`, `breakpoint`, `sleep`, `orchestrator_task`, or custom
`label`	string or null	Human-readable label
`taskDefRef`	string or null	Relative path to task.json
`resultRef`	string or null	Relative path to result.json (null if pending)

`session:iteration-message` Output

Command signature:

babysitter session:iteration-message \
  --iteration <n> \
  [--run-id <id>] \
  [--runs-dir <dir>] \
  [--plugin-root <dir>] \
  --json

Note: This command does NOT accept --session-id or --state-dir. It operates on run data directly via --run-id and --runs-dir.

{
  "systemMessage": "Babysitter iteration 3 | Waiting on: node. Check if pending effects are resolved, then call run:iterate.",
  "runState": "waiting",
  "completionProof": null,
  "pendingKinds": "node",
  "skillContext": null,
  "iteration": 3
}

Field	Type	Description
`systemMessage`	string	The formatted message to re-inject into the agent's context
`runState`	`"created" \| "waiting" \| "completed" \| "failed"` or null	Derived run state
`completionProof`	string or null	Proof hash if run is completed
`pendingKinds`	string or null	Comma-separated list of pending effect kinds
`skillContext`	string or null	Discovered skill context (when `--plugin-root` is provided)
`iteration`	number	The iteration number passed in

7. CLI Error Handling

All CLI commands can fail. The harness must handle these failures gracefully rather than crashing or silently ignoring them. This section provides a unified error handling strategy.

Error Categories

Category	Symptom	Recovery Strategy
Timeout	CLI command exceeds expected duration	Kill the process, log the timeout, retry once with a longer timeout. If the retry also times out, APPROVE exit and log a diagnostic warning.
JSON parse error	stdout is empty or contains non-JSON text (e.g., stack traces, warnings)	Check stderr for error details. Strip any non-JSON prefix from stdout (some environments prepend warnings). If still unparseable, treat as a command failure.
Lock conflict	`run:iterate` or `task:post` fails because another process holds `run.lock`	Retry after 250ms, up to 40 retries (matching the SDK's internal retry behavior). If all retries fail, log the conflict and APPROVE exit.
Missing run directory	`run:status` or `run:iterate` returns non-zero with ENOENT-style error	The run was deleted or never created. Clean up the session state file and APPROVE exit.
Permission error	EACCES or EPERM on file operations	Check file ownership and permissions. This usually indicates a misconfigured `BABYSITTER_RUNS_DIR`.
Non-zero exit, valid JSON	CLI returns exit code != 0 but stdout contains valid JSON with an `error` field	Parse the JSON error object for structured diagnostics. The `error.code` field often contains a machine-readable error type.

Unified Error Handler

function handleCLIError(commandName, shellResult):
    // Step 1: Try to parse structured error from stdout
    if shellResult.stdout != "":
        try:
            parsed = parseJson(shellResult.stdout)
            if parsed.error:
                log("CLI error in {commandName}: {parsed.error.message} (code: {parsed.error.code})")
                return { category: "structured", error: parsed.error }
        catch parseError:
            // stdout is not valid JSON -- fall through
            pass

    // Step 2: Check for known error patterns in stderr
    stderr = shellResult.stderr or ""

    if contains(stderr, "ENOENT") or contains(stderr, "no such file"):
        return { category: "missing_path", message: stderr }

    if contains(stderr, "run.lock") or contains(stderr, "EBUSY"):
        return { category: "lock_conflict", message: stderr, retryable: true }

    if contains(stderr, "EACCES") or contains(stderr, "EPERM"):
        return { category: "permission", message: stderr }

    if shellResult.timedOut:
        return { category: "timeout", message: "Command {commandName} timed out after {shellResult.timeoutMs}ms" }

    // Step 3: Generic failure
    return {
        category: "unknown",
        exitCode: shellResult.exitCode,
        message: stderr or "Command {commandName} failed with exit code {shellResult.exitCode}"
    }

Recommended Timeouts by Command

Command	Recommended Timeout	Notes
`version --json`	5s	Should be near-instant
`session:init`	5s	File creation only
`session:associate`	5s	File update only
`run:create`	10s	Creates directory structure and journal
`run:iterate`	120s	May execute process function; uses `BABYSITTER_TIMEOUT` env var
`run:status`	10s	Reads journal and derives state
`task:list`	10s	Reads task directories
`task:post`	15s	Writes result + appends journal event
`session:check-iteration`	5s	Reads and parses state file
`session:iteration-message`	10s	Reads run state, discovers skills
`run:repair-journal`	30s	Scans and repairs journal files

8. Edge Cases

Note: For CLI command failures (timeouts, parse errors, lock conflicts), see Section 7: CLI Error Handling.

Stale Session State File

If the harness crashes or the agent is forcefully terminated, a session state file may be left behind. On the next session start, session:init will fail with SESSION_EXISTS. Handling:

function handleStaleSession(sessionId, stateDir):
    stateFile = "{stateDir}/{sessionId}.md"
    existing = parseSessionState(stateFile)

    // If the run is completed or failed, clean up and re-init
    if existing.runId != "":
        statusResult = shell("babysitter run:status .a5c/runs/{existing.runId} --json")
        if statusResult.exitCode == 0:
            runStatus = parseJson(statusResult.stdout)
            if runStatus.state in ["completed", "failed"]:
                deleteFile(stateFile)
                return shell("babysitter session:init --session-id {sessionId} --state-dir {stateDir} --json")

    // Otherwise, offer to resume
    return { action: "resume_or_cleanup", existingRunId: existing.runId }

Run Directory Missing or Corrupted

If the run directory is deleted or journal files are corrupted, run:status and run:iterate will return non-zero exit codes. The stop interceptor should APPROVE exit in this case (Guard 5 in the decision algorithm).

Concurrent Sessions on Same Run

The SDK uses file-based run locking (run.lock with PID). If two sessions try to iterate the same run concurrently, one will fail to acquire the lock. The harness should retry after a short delay (250ms, up to 40 retries) or report the conflict.

Effect Posted but Journal Not Updated

If the harness crashes between writing result.json and the CLI appending the EFFECT_RESOLVED journal event, the run may appear stuck. Use run:repair-journal to detect and fix such inconsistencies:

babysitter run:repair-journal .a5c/runs/{runId} --json

Zero Pending Tasks After Iterate

If run:iterate returns status=waiting but task:list --pending returns zero tasks, this indicates all effects were resolved during the iterate call itself (e.g., via replay). Simply call run:iterate again on the next iteration.

9. Testing the Integration

Smoke Test Checklist

Run these tests in order. Each builds on the previous.

Test 1: CLI Availability

babysitter version --json

Exit code is 0
Output contains "version" field

Test 2: Session Initialization

babysitter session:init \
  --session-id test-session-001 \
  --state-dir /tmp/babysitter-test/state \
  --json

Exit code is 0
State file created at /tmp/babysitter-test/state/test-session-001.md
File contains YAML frontmatter with active: true, iteration: 1, run_id: ""

Test 3: Run Creation

# Create a minimal process file
cat > /tmp/babysitter-test/process.js << 'EOF'
exports.process = async function(inputs, ctx) {
  const result = await ctx.task('greet', { name: inputs.name });
  return { greeting: result };
};
EOF

# Create inputs file
echo '{"name": "World"}' > /tmp/babysitter-test/inputs.json

# Create the run
babysitter run:create \
  --process-id test-process \
  --entry /tmp/babysitter-test/process.js#process \
  --inputs /tmp/babysitter-test/inputs.json \
  --prompt "Test run" \
  --json

Exit code is 0
Output contains "runId" field
Directory .a5c/runs/{runId}/ exists with run.json and journal/

Test 4: Session Binding

babysitter session:associate \
  --session-id test-session-001 \
  --run-id {runId} \
  --state-dir /tmp/babysitter-test/state \
  --json

Exit code is 0
State file now has run_id: "{runId}"

Test 5: Run Iterate

babysitter run:iterate .a5c/runs/{runId} --json

Exit code is 0
Output contains "status" field

Test 6: Task List

babysitter task:list .a5c/runs/{runId} --pending --json

Exit code is 0
Output contains "tasks" array

Test 7: Iteration Guard

babysitter session:check-iteration \
  --session-id test-session-001 \
  --state-dir /tmp/babysitter-test/state \
  --json

Exit code is 0
Output contains "shouldContinue": true

Test 8: Iteration Message

babysitter session:iteration-message \
  --iteration 2 \
  --run-id {runId} \
  --runs-dir .a5c/runs \
  --json

Exit code is 0
Output contains "systemMessage" field
Output contains "iteration": 2

Common Failure Modes

Symptom	Likely Cause	Fix
`babysitter: command not found`	SDK not installed or not on PATH	Re-run installation (Section 2a)
Stop hook always APPROVEs	No session state file, or `run_id` is empty	Check session:init and session:associate ran
Infinite loop (never exits)	Completion proof not detected	Check transcript scanning for `<promise>` tags
Exits after 1 iteration	Stop interceptor not wired correctly	Verify BLOCK decision re-injects context
`Session already associated`	Re-entrant run on same session	Clean up old state file or complete old run
Iterations very fast, exits early	Runaway detection triggers (avg <= 15s)	Agent is not doing meaningful work per iteration; check effect execution
State file corrupt	Non-atomic write or concurrent access	Use atomic write protocol (temp + rename)
`task:post` fails	Writing result.json directly instead of via CLI	Always use `babysitter task:post` command
Run stuck in "waiting"	Effects executed but results not posted	Check task:post calls after each effect

End-to-End Integration Test

The following pseudocode validates the complete loop:

function testEndToEnd():
    sessionId = "e2e-test-" + randomId()
    pluginRoot = "/tmp/babysitter-e2e"
    stateDir = "{pluginRoot}/skills/babysit/state"
    runsDir = ".a5c/runs"

    // 1. Init
    ensureBabysitterCLI()
    shell("babysitter session:init --session-id {sessionId} --state-dir {stateDir} --json")

    // 2. Create and bind
    result = shell("babysitter run:create --process-id test --entry ./process.js#process --inputs inputs.json --json")
    runId = parseJson(result.stdout).runId
    shell("babysitter session:associate --session-id {sessionId} --run-id {runId} --state-dir {stateDir} --json")

    // 3. Iterate
    iterResult = shell("babysitter run:iterate .a5c/runs/{runId} --json")
    assert parseJson(iterResult.stdout).status in ["executed", "waiting", "completed"]

    // 4. Execute effects
    listResult = shell("babysitter task:list .a5c/runs/{runId} --pending --json")
    tasks = parseJson(listResult.stdout).tasks
    for task in tasks:
        // Execute task, write output
        shell("babysitter task:post .a5c/runs/{runId} {task.effectId} --status ok --value output.json --json")

    // 5. Check iteration guard
    guardResult = shell("babysitter session:check-iteration --session-id {sessionId} --state-dir {stateDir} --json")
    assert parseJson(guardResult.stdout).shouldContinue == true

    // 6. Get iteration message (correct params: --iteration, --run-id, --runs-dir)
    msgResult = shell("babysitter session:iteration-message --iteration 2 --run-id {runId} --runs-dir {runsDir} --json")
    assert parseJson(msgResult.stdout).systemMessage is not null

    // 7. Re-iterate until completed
    iterResult = shell("babysitter run:iterate .a5c/runs/{runId} --json")
    if parseJson(iterResult.stdout).status == "completed":
        proof = parseJson(iterResult.stdout).completionProof
        assert proof is not null and proof is not ""

    print("END-TO-END TEST PASSED")

10. Reference Implementation

The canonical reference implementation is the Claude Code harness adapter, documented at:

docs/assimilation/harness/claude-code-integration.md

Key files in the reference implementation:

File	Role
`packages/sdk/src/harness/types.ts`	`HarnessAdapter` interface definition
`packages/sdk/src/harness/claudeCode.ts`	Claude Code adapter (stop hook, session-start, binding)
`packages/sdk/src/harness/nullAdapter.ts`	No-op fallback adapter (useful as a starting template)
`packages/sdk/src/harness/registry.ts`	Adapter auto-detection and lookup
`packages/sdk/src/session/`	Session state parsing, writing, and types
`artifacts/generated-plugins/claude-code/hooks/babysitter-proxied-stop.sh`	Generated Claude Code stop hook entry
`artifacts/generated-plugins/claude-code/hooks/babysitter-proxied-session-start.sh`	Generated Claude Code session-start hook entry

Writing a New Harness Adapter

To add first-class SDK support for your harness, implement the HarnessAdapter interface:

interface HarnessAdapter {
  readonly name: string;
  isActive(): boolean;
  resolveSessionId(parsed: { sessionId?: string }): string | undefined;
  resolveStateDir(args: { stateDir?: string; pluginRoot?: string }): string | undefined;
  resolvePluginRoot(args: { pluginRoot?: string }): string | undefined;
  bindSession(opts: SessionBindOptions): Promise<SessionBindResult>;
  handleStopHook(args: HookHandlerArgs): Promise<number>;
  handleSessionStartHook(args: HookHandlerArgs): Promise<number>;
  findHookDispatcherPath(startCwd: string): string | null;
}

Register your adapter in packages/sdk/src/harness/registry.ts and it will be auto-detected when its isActive() method returns true.

For harnesses that cannot modify the SDK source, the entire integration can be built externally by calling the babysitter CLI commands documented in this guide.

Full-Code Example: Minimal Node.js Harness

The following is a complete, runnable Node.js implementation (not pseudocode) of a minimal harness that drives a single babysitter run to completion. It covers session initialization, run creation, the orchestration loop, effect execution for node tasks, and completion proof extraction.

#!/usr/bin/env node
// minimal-harness.js -- A complete minimal babysitter harness implementation.
// Usage: node minimal-harness.js <process-file>#<export> <inputs.json> [prompt]

const { execSync } = require('child_process');
const { readFileSync, writeFileSync, mkdirSync, existsSync } = require('fs');
const { join } = require('path');
const crypto = require('crypto');

// --- Configuration ---
const RUNS_DIR = '.a5c/runs';
const STATE_DIR = join(process.cwd(), '.harness-state');
const MAX_ITERATIONS = 256;
const RUNAWAY_THRESHOLD_ITERATIONS = 5;
const RUNAWAY_THRESHOLD_SECONDS = 15;
const CLI_TIMEOUT_MS = 120_000;

// --- Helpers ---
function cli(command, timeoutMs = CLI_TIMEOUT_MS) {
  try {
    const stdout = execSync(`babysitter ${command}`, {
      encoding: 'utf8',
      timeout: timeoutMs,
      stdio: ['pipe', 'pipe', 'pipe'],
    });
    return { exitCode: 0, stdout, stderr: '' };
  } catch (err) {
    return {
      exitCode: err.status ?? 1,
      stdout: err.stdout ?? '',
      stderr: err.stderr ?? '',
      timedOut: err.killed === true,
    };
  }
}

function cliJson(command, timeoutMs) {
  const result = cli(`${command} --json`, timeoutMs);
  if (result.exitCode !== 0) {
    console.error(`CLI error (${command}): ${result.stderr}`);
    return null;
  }
  try {
    return JSON.parse(result.stdout);
  } catch {
    console.error(`JSON parse error for ${command}: ${result.stdout.slice(0, 200)}`);
    return null;
  }
}

// --- Main ---
async function main() {
  const [,, entryPoint, inputsFile, prompt = 'Run process'] = process.argv;
  if (!entryPoint || !inputsFile) {
    console.error('Usage: node minimal-harness.js <entry>#<export> <inputs.json> [prompt]');
    process.exit(1);
  }

  const sessionId = `harness-${crypto.randomUUID().slice(0, 8)}`;
  mkdirSync(STATE_DIR, { recursive: true });

  // Step 1: Verify CLI
  const version = cliJson('version');
  if (!version) { console.error('babysitter CLI not available'); process.exit(1); }
  console.log(`Using babysitter SDK v${version.sdkVersion || version.version}`);

  // Step 2: Session init
  const initResult = cliJson(
    `session:init --session-id ${sessionId} --state-dir ${STATE_DIR}`
  );
  if (!initResult) { console.error('Session init failed'); process.exit(1); }

  // Step 3: Create run
  const processId = entryPoint.split('#')[0].replace(/[^a-zA-Z0-9-_]/g, '-');
  const createResult = cliJson(
    `run:create --process-id ${processId} --entry ${entryPoint}` +
    ` --inputs ${inputsFile} --prompt "${prompt.replace(/"/g, '\\"')}"`
  );
  if (!createResult) { console.error('Run creation failed'); process.exit(1); }
  const { runId } = createResult;
  const runDir = join(RUNS_DIR, runId);
  console.log(`Created run: ${runId}`);

  // Step 4: Bind session
  cliJson(
    `session:associate --session-id ${sessionId} --run-id ${runId} --state-dir ${STATE_DIR}`
  );

  // Step 5: Orchestration loop
  const iterationTimes = [];
  let iteration = 0;

  while (iteration < MAX_ITERATIONS) {
    iteration++;
    const iterStart = Date.now();
    console.log(`\n--- Iteration ${iteration} ---`);

    // 5a: Iterate
    const iterData = cliJson(`run:iterate ${runDir}`, CLI_TIMEOUT_MS);
    if (!iterData) { console.error('run:iterate failed'); break; }

    if (iterData.status === 'completed') {
      console.log(`Run completed. Proof: ${iterData.completionProof}`);
      break;
    }
    if (iterData.status === 'failed') {
      console.error('Run failed:', JSON.stringify(iterData, null, 2));
      break;
    }

    // 5b: List and execute pending tasks
    const listData = cliJson(`task:list ${runDir} --pending`);
    if (!listData || !listData.tasks || listData.tasks.length === 0) {
      console.log('No pending tasks; re-iterating...');
      continue;
    }

    for (const task of listData.tasks) {
      const taskDir = join(runDir, 'tasks', task.effectId);
      const taskDefPath = join(taskDir, 'task.json');
      if (!existsSync(taskDefPath)) {
        console.error(`task.json missing for ${task.effectId}`);
        continue;
      }
      const taskDef = JSON.parse(readFileSync(taskDefPath, 'utf8'));
      let result;

      switch (task.kind) {
        case 'node': {
          // Execute the node task's script
          try {
            const mod = require(taskDef.args.scriptPath);
            const fn = taskDef.args.exportName ? mod[taskDef.args.exportName] : mod.default || mod;
            const output = await fn(taskDef.args.input);
            result = { status: 'ok', value: output };
          } catch (err) {
            result = { status: 'error', value: { message: err.message, stack: err.stack } };
          }
          break;
        }
        case 'breakpoint':
          // Minimal harness: auto-reject breakpoints (non-interactive)
          result = { status: 'ok', value: { approved: false, reason: 'Non-interactive harness' } };
          break;
        case 'sleep': {
          const until = taskDef.args.until || taskDef.args.sleepUntil;
          const durationMs = taskDef.args.durationMs;
          const target = until ? new Date(until).getTime() : (Date.now() + (durationMs || 0));
          const remaining = target - Date.now();
          if (remaining > 0 && remaining <= 60000) {
            await new Promise(r => setTimeout(r, remaining));
          }
          result = { status: 'ok', value: { wokeAt: new Date().toISOString(), reason: 'waited' } };
          break;
        }
        default:
          result = { status: 'error', value: { message: `Unsupported task kind: ${task.kind}` } };
      }

      // Post result
      const outputPath = join(taskDir, 'output.json');
      writeFileSync(outputPath, JSON.stringify(result.value));
      cli(`task:post ${runDir} ${task.effectId} --status ${result.status} --value ${outputPath} --json`);
    }

    // 5c: Runaway detection
    const iterDuration = (Date.now() - iterStart) / 1000;
    iterationTimes.push(iterDuration);
    if (iteration >= RUNAWAY_THRESHOLD_ITERATIONS) {
      const recent = iterationTimes.slice(-3);
      const avg = recent.reduce((a, b) => a + b, 0) / recent.length;
      if (avg <= RUNAWAY_THRESHOLD_SECONDS) {
        console.error(`Runaway detected: avg ${avg.toFixed(1)}s <= ${RUNAWAY_THRESHOLD_SECONDS}s threshold`);
        break;
      }
    }
  }

  console.log(`\nHarness finished after ${iteration} iterations.`);
}

main().catch(err => { console.error(err); process.exit(1); });

Appendix: Complete CLI Command Reference

Command	Purpose	Section
`babysitter version --json`	Verify CLI installation	2a
`babysitter session:init --session-id ID --state-dir DIR --json`	Create baseline session state	2b
`babysitter run:create --process-id PID --entry FILE --inputs FILE --json`	Create a new run	2c
`babysitter session:associate --session-id ID --run-id RID --state-dir DIR --json`	Bind session to run	2c
`babysitter run:iterate RUNDIR --json`	Advance orchestration, discover effects	2d, 2e
`babysitter run:status RUNDIR --json`	Read run status and completion proof	2d
`babysitter session:iteration-message --iteration N [--run-id ID] [--runs-dir DIR] [--plugin-root DIR] --json`	Get context to re-inject after BLOCK	2d
`babysitter task:list RUNDIR --pending --json`	List pending effects	2e
`babysitter task:show RUNDIR EFFECTID --json`	Read task definition	2e
`babysitter task:post RUNDIR EFFECTID --status STATUS --value FILE --json`	Post effect result	2f
`babysitter session:check-iteration --session-id ID --state-dir DIR --json`	Check iteration guards	2g
`babysitter run:repair-journal RUNDIR --json`	Repair inconsistent journal	7
`babysitter hook:run --hook-type TYPE --harness NAME --json`	Dispatch a lifecycle hook	5

Table of Contents​

1. Prerequisites​

Checklist​

Minimum Environment​

2. Core Integration Points​

2a. SDK Installation​

Checklist​

Pseudocode​

CLI Command​

2b. Session Initialization​

When to Call​

Checklist​

Pseudocode​

What This Creates​

2c. Run Creation and Session Binding​

When to Call​

Checklist​

Pseudocode​

Re-entrant Run Prevention​

2d. The Orchestration Loop Driver​

Architecture​

The Decision Algorithm​

Intercepting Exit Signals​

Re-injecting Context​

Detecting the Completion Proof​

2e. Effect Execution​

The Effect Execution Cycle​

Effect Result Type​

Pseudocode​

Breakpoint Effect Handler​

Sleep Effect Handler​

Orchestrator Task Effect Handler​

Agent Effect Handler​

Reading Task Definitions​

2f. Result Posting​

IMPORTANT​

Pseudocode​

CLI Command​

Result Status Values​

2g. Iteration Guards​

CLI Command​

Output​

Guard Logic​

Two Detection Mechanisms​

3. Harness Capability Matrix​

Required vs Optional Capabilities​

Integration Tiers​

Tier 1: Minimum Viable Integration​

Tier 2: Robust Integration​

Tier 3: Full Integration​

4. Session State Contract​

File Format​

Example​

Required Fields​

State Transitions​

Atomic Write Protocol​

Timing Calculation​

5. Hook Equivalence Table​

SDK Hooks (13 KnownHookType values)​

Harness-Level Concepts (not SDK KnownHookType values)​

Hook Discovery Directories​

6. CLI Output Schemas​

run:status Output​

session:check-iteration Output​

task:list Output​

session:iteration-message Output​

7. CLI Error Handling​

Error Categories​

Unified Error Handler​

Recommended Timeouts by Command​

8. Edge Cases​

Stale Session State File​

Run Directory Missing or Corrupted​

Concurrent Sessions on Same Run​

Effect Posted but Journal Not Updated​

Zero Pending Tasks After Iterate​

9. Testing the Integration​

Smoke Test Checklist​

Test 1: CLI Availability​

Test 2: Session Initialization​

Table of Contents

1. Prerequisites

Checklist

Minimum Environment

2. Core Integration Points

2a. SDK Installation

Checklist

Pseudocode

CLI Command

2b. Session Initialization

When to Call

Checklist

Pseudocode

What This Creates

2c. Run Creation and Session Binding

When to Call

Checklist

Pseudocode

Re-entrant Run Prevention

2d. The Orchestration Loop Driver

Architecture

The Decision Algorithm

Intercepting Exit Signals

Re-injecting Context

Detecting the Completion Proof

2e. Effect Execution

The Effect Execution Cycle

Effect Result Type

Pseudocode

Breakpoint Effect Handler

Sleep Effect Handler

Orchestrator Task Effect Handler

Agent Effect Handler

Reading Task Definitions

2f. Result Posting

IMPORTANT

Pseudocode

CLI Command

Result Status Values

2g. Iteration Guards

CLI Command

Output

Guard Logic

Two Detection Mechanisms

3. Harness Capability Matrix

Required vs Optional Capabilities

Integration Tiers

Tier 1: Minimum Viable Integration

Tier 2: Robust Integration

Tier 3: Full Integration

4. Session State Contract

File Format

Example

Required Fields

State Transitions

Atomic Write Protocol

Timing Calculation

5. Hook Equivalence Table

SDK Hooks (13 `KnownHookType` values)

Harness-Level Concepts (not SDK KnownHookType values)

Hook Discovery Directories

6. CLI Output Schemas

`run:status` Output

`session:check-iteration` Output

`task:list` Output

`session:iteration-message` Output

7. CLI Error Handling

Error Categories

Unified Error Handler

Recommended Timeouts by Command

8. Edge Cases

Stale Session State File

Run Directory Missing or Corrupted

Concurrent Sessions on Same Run

Effect Posted but Journal Not Updated

Zero Pending Tasks After Iterate

9. Testing the Integration

Smoke Test Checklist

Test 1: CLI Availability

Test 2: Session Initialization