Skip to main content

Two-Loops Architecture: Understanding Hybrid Agentic Systems

Version: 1.1 Last Updated: 2026-01-26 Category: Feature Guide


TL;DR - What You Need to Know

Skip this section if you just want to USE babysitter. This document explains the architecture for those who want to understand WHY babysitter works the way it does, or who are building custom processes.

The key insight: Babysitter separates "what must happen" (deterministic rules) from "how to do it" (AI reasoning). This makes AI workflows reliable and debuggable.

┌─────────────────────────────────────────────────────────────────┐
│ LOOP 1: The Boss (Orchestrator) │
│ - "You must pass tests before deploying" │
│ - "You have max 10 attempts" │
│ - "Stop and ask for approval at this point" │
│ │
│ LOOP 2: The Worker (AI Agent) │
│ - "Figure out how to make these tests pass" │
│ - "Find and fix the bugs" │
│ - "Write the code that solves the problem" │
└─────────────────────────────────────────────────────────────────┘

When to read this document:

  • You're building custom processes
  • You want to understand guardrails and safety
  • You're debugging why a run behaves a certain way
  • You're an architect evaluating babysitter for your team

When to skip this document:

  • You just want to run existing processes
  • You're following a tutorial
  • You're a beginner (start with Quality Convergence instead)

Overview

Babysitter implements a Two-Loops Control Plane architecture that combines:

  1. Symbolic Orchestration (Process Engine): Deterministic, code-defined control
  2. Agentic Harness (LLM Runtime): Adaptive, AI-powered work execution

This hybrid approach delivers the best of both worlds: the reliability of deterministic systems with the flexibility of AI reasoning.

Why Two Loops?

Single-Loop AITwo-Loops Hybrid
Unpredictable behaviorBounded, testable autonomy
Hard to debugJournaled, replayable execution
No safety guaranteesEnforced guardrails and gates
"It seems done"Evidence-driven completion
Context degradationFresh context per task

The Core Building Blocks

A) Symbolic Orchestrator (Process Engine)

The orchestrator is the code-defined process that enforces:

ResponsibilityExample
Ground truth stateRun is in "implementation" phase
Progression rulesMust pass tests before deployment
InvariantsNever modify production directly
BudgetsMax 10 iterations, 30 min timeout
PermissionsOnly write to src/ directory
Quality gatesTests, lint, security must pass
JournalingEvery event recorded for replay
Time travelFork from any point, compare runs

The orchestrator owns making execution dependable.

B) Agent Harness (LLM Runtime)

The harness is not "just an LLM call." Modern harnesses include:

CapabilityDescription
Iterative planningPlan → Execute → Replan
Tool callingFiles, terminal, search, code execution
Command executionParse results, handle errors
Incremental fixesIterate until checks pass
Structured artifactsPlans, diffs, summaries
Multi-step reasoningWith constraints
Sub-agentsDelegation inside the harness

The harness owns solving fuzzy parts and adapting to feedback.

C) Symbolic Logic Surfaces (Shared Capabilities)

Symbolic logic appears in multiple places, all consistent:

  1. Inside orchestrator (stage transitions, invariants, gates, budgets)
  2. As symbolic tools callable by the harness (policy checks, gate evaluation)
  3. As symbolic tasks callable by orchestration (validators, analyzers)
// Symbolic logic as orchestrator rule (using loop for retry)
for (let iteration = 0; iteration < maxIterations; iteration++) {
const impl = await ctx.task(implementTask, { feature });
const testResults = await ctx.task(runTestsTask, { impl });

if (testResults.passed) break; // Success - exit loop
// Loop continues with feedback from failed tests
}

// Symbolic logic as tool callable by harness
const allowed = await ctx.task(policyCheckTask, {
action: 'modifyFile',
path: '/etc/config.json'
});

// Symbolic logic as validation task
const gateResult = await ctx.task(securityGateTask, {
files: impl.filesModified
});

The Two Loops in Detail

Loop 1: Orchestration Loop (Symbolic)

A process stepper that progresses a run through explicit stages.

Typical Cycle:

1. Reconstruct "what is true" from the journal
2. Determine what stage the run is in
3. Check gates/constraints/budgets
4. Choose the next allowed transition
5. Emit the next effect (or wait)
6. Record results back into the journal

This loop is about: control, safety, repeatability, traceability.

Loop 2: Agentic Loop (Harness)

A tool-using reasoning loop that iterates until reaching a local objective.

Typical Cycle:

1. Read current objective + constraints
2. Decide what evidence is needed
3. Call tools, inspect results
4. Update plan or actions
5. Produce an output (patch, plan, answer, report)

This loop is about: solving the task when information is incomplete.


What Goes Where?

The design challenge is deciding which execution decisions are deterministic/symbolic and which are adaptive/agentic.

Put in Symbolic Logic When...

These decisions must be stable, enforceable, and auditable:

Decision TypeExamples
Safety/permissionsWhat actions are allowed
Budgets/limitsTime, cost, tool call limits
State transitionsWhat stage you're in
Concurrency rulesWhat can run in parallel
Retry/timeout policyWhat happens on failure
IdempotencyAvoid double execution
Quality gatesWhat proof is required
Compliance/auditLogging requirements

Put in Agent Harness When...

These decisions benefit from flexible reasoning:

Decision TypeExamples
Ambiguous instructions"Make it better"
Uncertain approachMultiple valid solutions
Search/discoveryFind relevant files
DraftingCode, docs, analyses
DebuggingIterate against tool results
SummarizingCompress evidence
ProposingCandidate solutions

The Mixed Zone

Many tasks are mixed. The pattern is:

  • Symbolic logic defines the envelope (constraints + gates + budgets)
  • Harness explores inside that envelope (implements, debugs, refines)
  • Both can invoke symbolic rules (nothing is guesswork)
// Mixed: Harness works, orchestrator validates (loop-based retry)
let securityPassed = false;
for (let iteration = 0; iteration < maxIterations && !securityPassed; iteration++) {
const impl = await ctx.task(implementTask, {
feature,
constraints: {
allowedPaths: ['src/**'],
forbiddenPatterns: ['eval(', 'exec('],
maxFilesModified: 10
},
// Pass previous feedback on retry iterations
feedback: iteration > 0 ? lastSecurityResult.recommendations : null
});

// Orchestrator enforces gate
const securityResult = await ctx.task(securityGateTask, { impl });
securityPassed = securityResult.passed;
lastSecurityResult = securityResult;
}

The Four Guardrail Layers

Guardrails are a layered approach, not a single feature.

Layer A: Capability Guardrails (What's Possible)

Define what tools and actions exist.

const capabilityConfig = {
allowedTools: ['read', 'write', 'shell', 'search'],
pathRestrictions: ['src/**', 'tests/**'],
networkAccess: 'none',
permissions: 'read-write',
destructiveActions: 'require-confirmation'
};

Layer B: Budget Guardrails (How Far)

Prevent runaway execution.

const budgetConfig = {
maxToolCalls: 100,
maxWallClockMinutes: 30,
maxTokenSpend: 50000,
maxIterations: 10,
rateLimits: { apiCalls: '10/minute' }
};

Layer C: Policy Guardrails (What's Allowed)

Rules that define acceptable behavior.

const policyConfig = {
rules: [
'never exfiltrate secrets',
'never modify production directly',
'always run tests before merge',
'security scans required for dependencies'
]
};

Layer D: Behavioral Guardrails (How Decisions Are Made)

Structural consistency in outputs.

const behavioralConfig = {
requireStructuredOutputs: true,
requireEvidenceCitations: true,
requireUncertaintyDeclaration: true,
outputSchemas: { /* JSON schemas */ }
};

Quality Gates: Turning Agentic Work into Reliable Outcomes

Quality gates convert "it seems done" into "it is done."

The Evidence-Driven Pattern

Each phase must end with:

ComponentDescription
ArtifactThe work product (patch, doc, config, report)
EvidenceProof it meets requirements (logs, test output, checks)

If you don't have evidence, you don't have completion.

Common Gated Steps

Gate TypeWhat It Validates
Unit testsIndividual functions work
Integration testsComponents work together
System testsEnd-to-end behavior
Acceptance testsUser requirements met
Lint/formattingCode style compliance
Type checkingType safety
Static analysisPotential bugs
Security scansVulnerabilities
ReproducibilityClean run in fresh env
Diff reviewNo forbidden file changes
PerformanceMeets thresholds

Where Gates Live (Consistent Everywhere)

// In orchestrator: loop-based retry for gate failures
let gateResults = { passed: false };
for (let i = 0; i < maxIterations && !gateResults.passed; i++) {
const impl = await ctx.task(implementTask, { feature, feedback: gateResults.failures });
gateResults = await ctx.task(runGatesTask, { impl });
}

// As symbolic tool: harness pre-checks during work
const gateResult = await checkGate(impl);
if (!gateResult.passed) {
// Harness can immediately attempt repair
await repairIssues(gateResult.failures);
}

// As symbolic task: verify evidence objectively
const evidence = await ctx.task(gateValidatorTask, { impl });

Human Approval Gates

For high-impact steps, include explicit checkpoints:

// Plan approval before execution
await ctx.breakpoint({
question: 'Review the plan. Approve to proceed with implementation?',
title: 'Plan Approval',
context: { /* ... */ }
});

// Diff approval before merge
await ctx.breakpoint({
question: `Review the diff (${diff.linesChanged} lines). Approve to merge?`,
title: 'Merge Approval'
});

// Deployment approval
await ctx.breakpoint({
question: 'Quality: 92/100. Deploy to production?',
title: 'Production Deployment'
});

The Journal: Making Execution Testable

A journaled control plane turns agentic behavior into something you can:

CapabilityValue
ReplayDebug by re-running
InspectSee exactly what happened
DiffCompare across forks
AuditCompliance evidence
AnalyzeFailure pattern detection

What's Journaled

Event TypeExample
Inputs/signalsInitial requirements
Stage transitions"planning" → "implementation"
Requested actionswriteFile('/src/auth.ts', ...)
ResultsAction succeeded, 42 lines written
Artifactsplan.md, implementation.patch
EvidenceTest results, gate outcomes
Gate outcomesSecurity: PASS, Tests: PASS
ApprovalsUser approved at breakpoint

Prompt Quality is Determinism Engineering

In a two-loop system, prompts are configuration for the harness.

Why Prompt Quality Matters

Better prompts reduce:

  • Output variance
  • Tool misuse
  • Hidden assumptions
  • Inconsistent formatting
  • Unpredictable branching

Better prompts improve:

  • Repeatability
  • Debuggability
  • Fork comparisons
  • Safe automation

The Real Goal: Structural Consistency

You don't need identical wording. You need consistent:

  • Decision formats
  • Priorities
  • Stop/ask conditions
  • Evidence standards

Prompt Versioning

Treat harness prompts like engineering surfaces:

const promptVersion = '2.1.0';

const implementerPrompt = {
version: promptVersion,
role: 'senior software engineer',
task: 'Implement feature according to specification',
constraints: [
'Follow existing code patterns',
'Write tests for all public functions',
'Document complex logic',
'Ask for clarification if requirements are ambiguous'
],
outputFormat: {
type: 'object',
required: ['filesModified', 'summary', 'confidence']
}
};

Common Failure Modes and Fixes

1. Everything is Agentic

Symptom: Unpredictable behavior, hard to debug, inconsistent safety.

Fix: Move gates, budgets, and invariants into symbolic orchestration.

2. Everything is Symbolic

Symptom: Brittle workflows, poor adaptation, high maintenance.

Fix: Delegate fuzzy decisions and exploration to the harness.

3. Hidden State

Symptom: The harness "remembers" things the system never logged.

Fix: Journal what matters; the system's truth must be reconstructible.

4. Wide Tool Surface

Symptom: Tool confusion, increased risk, unpredictable results.

Fix: Keep tools small, stable, and well-described.

5. No Explicit Evidence Requirements

Symptom: "Done" claims without proof.

Fix: Define completion as artifact + evidence, enforced by gates.


The Doctrine

If you define only a few principles, make them these:

  1. The orchestrator owns run progression, journaling, and phase boundaries
  2. Symbolic logic owns constraints, permissions, budgets, and gates
  3. The harness owns adaptive work inside constraints
  4. Guardrails are enforced by symbolic checks, not informal intentions
  5. Quality is evidence-driven, not assertion-driven
  6. Prompts are versioned control surfaces for harness behavior
  7. The journal is the source of truth for replay, audit, and forking

Getting Started

If you're building from scratch:

  1. Define phases (a small symbolic process)
  2. Define effects/tools available in each phase
  3. Add budgets and permissions
  4. Decide quality gates per phase
  5. Add a harness that can do real work
  6. Journal everything needed for replay and audit
  7. Add fork + time travel as first-class operations

If you do only one thing: make completion require evidence.


Process Library Examples

Spec-Driven Development

methodologies/spec-driven-development.js

Implements the full two-loops pattern:

  • Symbolic: Constitution validation, plan-constitution alignment, consistency analysis
  • Agentic: Specification writing, planning, implementation
  • Gates: Every phase has approval breakpoints

V-Model

methodologies/v-model.js

Heavy on symbolic verification:

  • Four test levels designed before implementation
  • Traceability matrix ensures complete coverage
  • Safety levels adjust rigor

GSD Iterative Convergence

gsd/iterative-convergence.js

Feedback-driven quality loop:

  • Implement → Score → Feedback → Repeat
  • Breakpoints at quality thresholds
  • Plateau detection for early exit


Summary

The Two-Loops architecture enables bounded, testable autonomy:

  • Orchestration Loop provides control, safety, and traceability
  • Agentic Loop provides capability, adaptation, and problem-solving
  • Quality Gates turn "seems done" into "is done" with evidence
  • Guardrails enforce rules at capability, budget, policy, and behavioral levels
  • Journaling makes everything replayable and auditable

When done well, you get autonomy that is bounded, testable, and steadily improvable.


SDK API Quick Reference

The complete list of SDK intrinsics (functions available on ctx):

FunctionPurposeExample
ctx.task(taskDef, args)Execute a taskawait ctx.task(buildTask, { target: 'dist' })
ctx.breakpoint(opts)Pause for human approvalawait ctx.breakpoint({ question: 'Deploy?', title: 'Approval' })
ctx.parallel.all([...])Run tasks in parallelawait ctx.parallel.all([() => ctx.task(a), () => ctx.task(b)])
ctx.parallel.map(arr, fn)Map over array in parallelawait ctx.parallel.map(files, f => ctx.task(lint, { file: f }))
ctx.sleepUntil(iso8601)Pause until a specific timeawait ctx.sleepUntil('2026-01-27T10:00:00Z')
ctx.log(msg, data?)Log message to journalctx.log('Quality score', { score: 85 })
ctx.now()Get current time (deterministic)const ts = ctx.now().getTime()
ctx.runIdCurrent run identifierconst id = ctx.runId

Important: There is NO ctx.retry(). Use loops for retry logic:

// Correct: Loop-based retry
for (let i = 0; i < maxIterations && !passed; i++) {
const result = await ctx.task(implementTask, { feedback });
passed = result.testsPass;
feedback = result.errors;
}

What To Do Next

Based on your role, here's your next step:

If you are...Do this next
BeginnerRead Quality Convergence for the core iteration pattern
Building processesStudy Best Practices for workflow design
Debugging a runCheck Journal System to understand event sourcing
Adding approvalsSee Breakpoints for human-in-the-loop patterns
Evaluating for teamReview the Four Guardrail Layers section above