Two-Loops Architecture: Understanding Hybrid Agentic Systems

Version: 1.1 Last Updated: 2026-01-26 Category: Feature Guide

TL;DR - What You Need to Know

Skip this section if you just want to USE babysitter. This document explains the architecture for those who want to understand WHY babysitter works the way it does, or who are building custom processes.

The key insight: Babysitter separates "what must happen" (deterministic rules) from "how to do it" (AI reasoning). This makes AI workflows reliable and debuggable.

┌─────────────────────────────────────────────────────────────────┐
│  LOOP 1: The Boss (Orchestrator)                                │
│  - "You must pass tests before deploying"                       │
│  - "You have max 10 attempts"                                   │
│  - "Stop and ask for approval at this point"                    │
│                                                                 │
│  LOOP 2: The Worker (AI Agent)                                  │
│  - "Figure out how to make these tests pass"                    │
│  - "Find and fix the bugs"                                      │
│  - "Write the code that solves the problem"                     │
└─────────────────────────────────────────────────────────────────┘

When to read this document:

You're building custom processes
You want to understand guardrails and safety
You're debugging why a run behaves a certain way
You're an architect evaluating babysitter for your team

When to skip this document:

You just want to run existing processes
You're following a tutorial
You're a beginner (start with Quality Convergence instead)

Overview

Babysitter implements a Two-Loops Control Plane architecture that combines:

Symbolic Orchestration (Process Engine): Deterministic, code-defined control
Agentic Harness (LLM Runtime): Adaptive, AI-powered work execution

This hybrid approach delivers the best of both worlds: the reliability of deterministic systems with the flexibility of AI reasoning.

Why Two Loops?

Single-Loop AI	Two-Loops Hybrid
Unpredictable behavior	Bounded, testable autonomy
Hard to debug	Journaled, replayable execution
No safety guarantees	Enforced guardrails and gates
"It seems done"	Evidence-driven completion
Context degradation	Fresh context per task

The Core Building Blocks

A) Symbolic Orchestrator (Process Engine)

The orchestrator is the code-defined process that enforces:

Responsibility	Example
Ground truth state	Run is in "implementation" phase
Progression rules	Must pass tests before deployment
Invariants	Never modify production directly
Budgets	Max 10 iterations, 30 min timeout
Permissions	Only write to `src/` directory
Quality gates	Tests, lint, security must pass
Journaling	Every event recorded for replay
Time travel	Fork from any point, compare runs

The orchestrator owns making execution dependable.

B) Agent Harness (LLM Runtime)

The harness is not "just an LLM call." Modern harnesses include:

Capability	Description
Iterative planning	Plan → Execute → Replan
Tool calling	Files, terminal, search, code execution
Command execution	Parse results, handle errors
Incremental fixes	Iterate until checks pass
Structured artifacts	Plans, diffs, summaries
Multi-step reasoning	With constraints
Sub-agents	Delegation inside the harness

The harness owns solving fuzzy parts and adapting to feedback.

C) Symbolic Logic Surfaces (Shared Capabilities)

Symbolic logic appears in multiple places, all consistent:

Inside orchestrator (stage transitions, invariants, gates, budgets)
As symbolic tools callable by the harness (policy checks, gate evaluation)
As symbolic tasks callable by orchestration (validators, analyzers)

// Symbolic logic as orchestrator rule (using loop for retry)
for (let iteration = 0; iteration < maxIterations; iteration++) {
  const impl = await ctx.task(implementTask, { feature });
  const testResults = await ctx.task(runTestsTask, { impl });

  if (testResults.passed) break; // Success - exit loop
  // Loop continues with feedback from failed tests
}

// Symbolic logic as tool callable by harness
const allowed = await ctx.task(policyCheckTask, {
  action: 'modifyFile',
  path: '/etc/config.json'
});

// Symbolic logic as validation task
const gateResult = await ctx.task(securityGateTask, {
  files: impl.filesModified
});

The Two Loops in Detail

Loop 1: Orchestration Loop (Symbolic)

A process stepper that progresses a run through explicit stages.

Typical Cycle:

Reconstruct "what is true" from the journal
Determine what stage the run is in
Check gates/constraints/budgets
Choose the next allowed transition
Emit the next effect (or wait)
Record results back into the journal

This loop is about: control, safety, repeatability, traceability.

Loop 2: Agentic Loop (Harness)

A tool-using reasoning loop that iterates until reaching a local objective.

Typical Cycle:

Read current objective + constraints
Decide what evidence is needed
Call tools, inspect results
Update plan or actions
Produce an output (patch, plan, answer, report)

This loop is about: solving the task when information is incomplete.

What Goes Where?

The design challenge is deciding which execution decisions are deterministic/symbolic and which are adaptive/agentic.

Put in Symbolic Logic When...

These decisions must be stable, enforceable, and auditable:

Decision Type	Examples
Safety/permissions	What actions are allowed
Budgets/limits	Time, cost, tool call limits
State transitions	What stage you're in
Concurrency rules	What can run in parallel
Retry/timeout policy	What happens on failure
Idempotency	Avoid double execution
Quality gates	What proof is required
Compliance/audit	Logging requirements

Put in Agent Harness When...

These decisions benefit from flexible reasoning:

Decision Type	Examples
Ambiguous instructions	"Make it better"
Uncertain approach	Multiple valid solutions
Search/discovery	Find relevant files
Drafting	Code, docs, analyses
Debugging	Iterate against tool results
Summarizing	Compress evidence
Proposing	Candidate solutions

The Mixed Zone

Many tasks are mixed. The pattern is:

Symbolic logic defines the envelope (constraints + gates + budgets)
Harness explores inside that envelope (implements, debugs, refines)
Both can invoke symbolic rules (nothing is guesswork)

// Mixed: Harness works, orchestrator validates (loop-based retry)
let securityPassed = false;
for (let iteration = 0; iteration < maxIterations && !securityPassed; iteration++) {
  const impl = await ctx.task(implementTask, {
    feature,
    constraints: {
      allowedPaths: ['src/**'],
      forbiddenPatterns: ['eval(', 'exec('],
      maxFilesModified: 10
    },
    // Pass previous feedback on retry iterations
    feedback: iteration > 0 ? lastSecurityResult.recommendations : null
  });

  // Orchestrator enforces gate
  const securityResult = await ctx.task(securityGateTask, { impl });
  securityPassed = securityResult.passed;
  lastSecurityResult = securityResult;
}

The Four Guardrail Layers

Guardrails are a layered approach, not a single feature.

Layer A: Capability Guardrails (What's Possible)

Define what tools and actions exist.

const capabilityConfig = {
  allowedTools: ['read', 'write', 'shell', 'search'],
  pathRestrictions: ['src/**', 'tests/**'],
  networkAccess: 'none',
  permissions: 'read-write',
  destructiveActions: 'require-confirmation'
};

Layer B: Budget Guardrails (How Far)

Prevent runaway execution.

const budgetConfig = {
  maxToolCalls: 100,
  maxWallClockMinutes: 30,
  maxTokenSpend: 50000,
  maxIterations: 10,
  rateLimits: { apiCalls: '10/minute' }
};

Layer C: Policy Guardrails (What's Allowed)

Rules that define acceptable behavior.

const policyConfig = {
  rules: [
    'never exfiltrate secrets',
    'never modify production directly',
    'always run tests before merge',
    'security scans required for dependencies'
  ]
};

Layer D: Behavioral Guardrails (How Decisions Are Made)

Structural consistency in outputs.

const behavioralConfig = {
  requireStructuredOutputs: true,
  requireEvidenceCitations: true,
  requireUncertaintyDeclaration: true,
  outputSchemas: { /* JSON schemas */ }
};

Quality Gates: Turning Agentic Work into Reliable Outcomes

Quality gates convert "it seems done" into "it is done."

The Evidence-Driven Pattern

Each phase must end with:

Component	Description
Artifact	The work product (patch, doc, config, report)
Evidence	Proof it meets requirements (logs, test output, checks)

If you don't have evidence, you don't have completion.

Common Gated Steps

Gate Type	What It Validates
Unit tests	Individual functions work
Integration tests	Components work together
System tests	End-to-end behavior
Acceptance tests	User requirements met
Lint/formatting	Code style compliance
Type checking	Type safety
Static analysis	Potential bugs
Security scans	Vulnerabilities
Reproducibility	Clean run in fresh env
Diff review	No forbidden file changes
Performance	Meets thresholds

Where Gates Live (Consistent Everywhere)

// In orchestrator: loop-based retry for gate failures
let gateResults = { passed: false };
for (let i = 0; i < maxIterations && !gateResults.passed; i++) {
  const impl = await ctx.task(implementTask, { feature, feedback: gateResults.failures });
  gateResults = await ctx.task(runGatesTask, { impl });
}

// As symbolic tool: harness pre-checks during work
const gateResult = await checkGate(impl);
if (!gateResult.passed) {
  // Harness can immediately attempt repair
  await repairIssues(gateResult.failures);
}

// As symbolic task: verify evidence objectively
const evidence = await ctx.task(gateValidatorTask, { impl });

Human Approval Gates

For high-impact steps, include explicit checkpoints:

// Plan approval before execution
await ctx.breakpoint({
  question: 'Review the plan. Approve to proceed with implementation?',
  title: 'Plan Approval',
  context: { /* ... */ }
});

// Diff approval before merge
await ctx.breakpoint({
  question: `Review the diff (${diff.linesChanged} lines). Approve to merge?`,
  title: 'Merge Approval'
});

// Deployment approval
await ctx.breakpoint({
  question: 'Quality: 92/100. Deploy to production?',
  title: 'Production Deployment'
});

The Journal: Making Execution Testable

A journaled control plane turns agentic behavior into something you can:

Capability	Value
Replay	Debug by re-running
Inspect	See exactly what happened
Diff	Compare across forks
Audit	Compliance evidence
Analyze	Failure pattern detection

What's Journaled

Event Type	Example
Inputs/signals	Initial requirements
Stage transitions	"planning" → "implementation"
Requested actions	`writeFile('/src/auth.ts', ...)`
Results	Action succeeded, 42 lines written
Artifacts	`plan.md`, `implementation.patch`
Evidence	Test results, gate outcomes
Gate outcomes	Security: PASS, Tests: PASS
Approvals	User approved at breakpoint

Prompt Quality is Determinism Engineering

In a two-loop system, prompts are configuration for the harness.

Why Prompt Quality Matters

Better prompts reduce:

Output variance
Tool misuse
Hidden assumptions
Inconsistent formatting
Unpredictable branching

Better prompts improve:

Repeatability
Debuggability
Fork comparisons
Safe automation

The Real Goal: Structural Consistency

You don't need identical wording. You need consistent:

Decision formats
Priorities
Stop/ask conditions
Evidence standards

Prompt Versioning

Treat harness prompts like engineering surfaces:

const promptVersion = '2.1.0';

const implementerPrompt = {
  version: promptVersion,
  role: 'senior software engineer',
  task: 'Implement feature according to specification',
  constraints: [
    'Follow existing code patterns',
    'Write tests for all public functions',
    'Document complex logic',
    'Ask for clarification if requirements are ambiguous'
  ],
  outputFormat: {
    type: 'object',
    required: ['filesModified', 'summary', 'confidence']
  }
};

Common Failure Modes and Fixes

1. Everything is Agentic

Symptom: Unpredictable behavior, hard to debug, inconsistent safety.

Fix: Move gates, budgets, and invariants into symbolic orchestration.

2. Everything is Symbolic

Symptom: Brittle workflows, poor adaptation, high maintenance.

Fix: Delegate fuzzy decisions and exploration to the harness.

3. Hidden State

Symptom: The harness "remembers" things the system never logged.

Fix: Journal what matters; the system's truth must be reconstructible.

4. Wide Tool Surface

Symptom: Tool confusion, increased risk, unpredictable results.

Fix: Keep tools small, stable, and well-described.

5. No Explicit Evidence Requirements

Symptom: "Done" claims without proof.

Fix: Define completion as artifact + evidence, enforced by gates.

The Doctrine

If you define only a few principles, make them these:

The orchestrator owns run progression, journaling, and phase boundaries
Symbolic logic owns constraints, permissions, budgets, and gates
The harness owns adaptive work inside constraints
Guardrails are enforced by symbolic checks, not informal intentions
Quality is evidence-driven, not assertion-driven
Prompts are versioned control surfaces for harness behavior
The journal is the source of truth for replay, audit, and forking

Getting Started

If you're building from scratch:

Define phases (a small symbolic process)
Define effects/tools available in each phase
Add budgets and permissions
Decide quality gates per phase
Add a harness that can do real work
Journal everything needed for replay and audit
Add fork + time travel as first-class operations

If you do only one thing: make completion require evidence.

Process Library Examples

Spec-Driven Development

methodologies/spec-driven-development.js

Implements the full two-loops pattern:

Symbolic: Constitution validation, plan-constitution alignment, consistency analysis
Agentic: Specification writing, planning, implementation
Gates: Every phase has approval breakpoints

V-Model

methodologies/v-model.js

Heavy on symbolic verification:

Four test levels designed before implementation
Traceability matrix ensures complete coverage
Safety levels adjust rigor

GSD Iterative Convergence

gsd/iterative-convergence.js

Feedback-driven quality loop:

Implement → Score → Feedback → Repeat
Breakpoints at quality thresholds
Plateau detection for early exit

Quality Convergence - Five quality gate types and 90-score pattern
Best Practices - Workflow design and guardrail patterns
Process Definitions - Creating your own processes
Journal System - Event sourcing and replay
Breakpoints - Human-in-the-loop approval

Summary

The Two-Loops architecture enables bounded, testable autonomy:

Orchestration Loop provides control, safety, and traceability
Agentic Loop provides capability, adaptation, and problem-solving
Quality Gates turn "seems done" into "is done" with evidence
Guardrails enforce rules at capability, budget, policy, and behavioral levels
Journaling makes everything replayable and auditable

When done well, you get autonomy that is bounded, testable, and steadily improvable.

SDK API Quick Reference

The complete list of SDK intrinsics (functions available on ctx):

Function	Purpose	Example
`ctx.task(taskDef, args)`	Execute a task	`await ctx.task(buildTask, { target: 'dist' })`
`ctx.breakpoint(opts)`	Pause for human approval	`await ctx.breakpoint({ question: 'Deploy?', title: 'Approval' })`
`ctx.parallel.all([...])`	Run tasks in parallel	`await ctx.parallel.all([() => ctx.task(a), () => ctx.task(b)])`
`ctx.parallel.map(arr, fn)`	Map over array in parallel	`await ctx.parallel.map(files, f => ctx.task(lint, { file: f }))`
`ctx.sleepUntil(iso8601)`	Pause until a specific time	`await ctx.sleepUntil('2026-01-27T10:00:00Z')`
`ctx.log(msg, data?)`	Log message to journal	`ctx.log('Quality score', { score: 85 })`
`ctx.now()`	Get current time (deterministic)	`const ts = ctx.now().getTime()`
`ctx.runId`	Current run identifier	`const id = ctx.runId`

Important: There is NO ctx.retry(). Use loops for retry logic:

// Correct: Loop-based retry
for (let i = 0; i < maxIterations && !passed; i++) {
  const result = await ctx.task(implementTask, { feedback });
  passed = result.testsPass;
  feedback = result.errors;
}

What To Do Next

Based on your role, here's your next step:

If you are...	Do this next
Beginner	Read Quality Convergence for the core iteration pattern
Building processes	Study Best Practices for workflow design
Debugging a run	Check Journal System to understand event sourcing
Adding approvals	See Breakpoints for human-in-the-loop patterns
Evaluating for team	Review the Four Guardrail Layers section above

TL;DR - What You Need to Know​

Overview​

Why Two Loops?​

The Core Building Blocks​

A) Symbolic Orchestrator (Process Engine)​

B) Agent Harness (LLM Runtime)​

C) Symbolic Logic Surfaces (Shared Capabilities)​

The Two Loops in Detail​

Loop 1: Orchestration Loop (Symbolic)​

Loop 2: Agentic Loop (Harness)​

What Goes Where?​

Put in Symbolic Logic When...​

Put in Agent Harness When...​

The Mixed Zone​

The Four Guardrail Layers​

Layer A: Capability Guardrails (What's Possible)​

Layer B: Budget Guardrails (How Far)​

Layer C: Policy Guardrails (What's Allowed)​

Layer D: Behavioral Guardrails (How Decisions Are Made)​

Quality Gates: Turning Agentic Work into Reliable Outcomes​

The Evidence-Driven Pattern​

Common Gated Steps​

Where Gates Live (Consistent Everywhere)​

Human Approval Gates​

The Journal: Making Execution Testable​

What's Journaled​

Prompt Quality is Determinism Engineering​

Why Prompt Quality Matters​

The Real Goal: Structural Consistency​

Prompt Versioning​

Common Failure Modes and Fixes​

1. Everything is Agentic​

2. Everything is Symbolic​

3. Hidden State​

4. Wide Tool Surface​

5. No Explicit Evidence Requirements​

The Doctrine​

Getting Started​

Process Library Examples​

Spec-Driven Development​

V-Model​

GSD Iterative Convergence​

Related Documentation​

Summary​

SDK API Quick Reference​

What To Do Next​