Two-Loops Architecture: Understanding Hybrid Agentic Systems
Version: 1.1 Last Updated: 2026-01-26 Category: Feature Guide
TL;DR - What You Need to Know
Skip this section if you just want to USE babysitter. This document explains the architecture for those who want to understand WHY babysitter works the way it does, or who are building custom processes.
The key insight: Babysitter separates "what must happen" (deterministic rules) from "how to do it" (AI reasoning). This makes AI workflows reliable and debuggable.
┌─────────────────────────────────────────────────────────────────┐
│ LOOP 1: The Boss (Orchestrator) │
│ - "You must pass tests before deploying" │
│ - "You have max 10 attempts" │
│ - "Stop and ask for approval at this point" │
│ │
│ LOOP 2: The Worker (AI Agent) │
│ - "Figure out how to make these tests pass" │
│ - "Find and fix the bugs" │
│ - "Write the code that solves the problem" │
└─────────────────────────────────────────────────────────────────┘
When to read this document:
- You're building custom processes
- You want to understand guardrails and safety
- You're debugging why a run behaves a certain way
- You're an architect evaluating babysitter for your team
When to skip this document:
- You just want to run existing processes
- You're following a tutorial
- You're a beginner (start with Quality Convergence instead)
Overview
Babysitter implements a Two-Loops Control Plane architecture that combines:
- Symbolic Orchestration (Process Engine): Deterministic, code-defined control
- Agentic Harness (LLM Runtime): Adaptive, AI-powered work execution
This hybrid approach delivers the best of both worlds: the reliability of deterministic systems with the flexibility of AI reasoning.
Why Two Loops?
| Single-Loop AI | Two-Loops Hybrid |
|---|---|
| Unpredictable behavior | Bounded, testable autonomy |
| Hard to debug | Journaled, replayable execution |
| No safety guarantees | Enforced guardrails and gates |
| "It seems done" | Evidence-driven completion |
| Context degradation | Fresh context per task |
The Core Building Blocks
A) Symbolic Orchestrator (Process Engine)
The orchestrator is the code-defined process that enforces:
| Responsibility | Example |
|---|---|
| Ground truth state | Run is in "implementation" phase |
| Progression rules | Must pass tests before deployment |
| Invariants | Never modify production directly |
| Budgets | Max 10 iterations, 30 min timeout |
| Permissions | Only write to src/ directory |
| Quality gates | Tests, lint, security must pass |
| Journaling | Every event recorded for replay |
| Time travel | Fork from any point, compare runs |
The orchestrator owns making execution dependable.
B) Agent Harness (LLM Runtime)
The harness is not "just an LLM call." Modern harnesses include:
| Capability | Description |
|---|---|
| Iterative planning | Plan → Execute → Replan |
| Tool calling | Files, terminal, search, code execution |
| Command execution | Parse results, handle errors |
| Incremental fixes | Iterate until checks pass |
| Structured artifacts | Plans, diffs, summaries |
| Multi-step reasoning | With constraints |
| Sub-agents | Delegation inside the harness |
The harness owns solving fuzzy parts and adapting to feedback.
C) Symbolic Logic Surfaces (Shared Capabilities)
Symbolic logic appears in multiple places, all consistent:
- Inside orchestrator (stage transitions, invariants, gates, budgets)
- As symbolic tools callable by the harness (policy checks, gate evaluation)
- As symbolic tasks callable by orchestration (validators, analyzers)
// Symbolic logic as orchestrator rule (using loop for retry)
for (let iteration = 0; iteration < maxIterations; iteration++) {
const impl = await ctx.task(implementTask, { feature });
const testResults = await ctx.task(runTestsTask, { impl });
if (testResults.passed) break; // Success - exit loop
// Loop continues with feedback from failed tests
}
// Symbolic logic as tool callable by harness
const allowed = await ctx.task(policyCheckTask, {
action: 'modifyFile',
path: '/etc/config.json'
});
// Symbolic logic as validation task
const gateResult = await ctx.task(securityGateTask, {
files: impl.filesModified
});
The Two Loops in Detail
Loop 1: Orchestration Loop (Symbolic)
A process stepper that progresses a run through explicit stages.
Typical Cycle:
1. Reconstruct "what is true" from the journal
2. Determine what stage the run is in
3. Check gates/constraints/budgets
4. Choose the next allowed transition
5. Emit the next effect (or wait)
6. Record results back into the journal
This loop is about: control, safety, repeatability, traceability.
Loop 2: Agentic Loop (Harness)
A tool-using reasoning loop that iterates until reaching a local objective.
Typical Cycle:
1. Read current objective + constraints
2. Decide what evidence is needed
3. Call tools, inspect results
4. Update plan or actions
5. Produce an output (patch, plan, answer, report)
This loop is about: solving the task when information is incomplete.
What Goes Where?
The design challenge is deciding which execution decisions are deterministic/symbolic and which are adaptive/agentic.
Put in Symbolic Logic When...
These decisions must be stable, enforceable, and auditable:
| Decision Type | Examples |
|---|---|
| Safety/permissions | What actions are allowed |
| Budgets/limits | Time, cost, tool call limits |
| State transitions | What stage you're in |
| Concurrency rules | What can run in parallel |
| Retry/timeout policy | What happens on failure |
| Idempotency | Avoid double execution |
| Quality gates | What proof is required |
| Compliance/audit | Logging requirements |
Put in Agent Harness When...
These decisions benefit from flexible reasoning:
| Decision Type | Examples |
|---|---|
| Ambiguous instructions | "Make it better" |
| Uncertain approach | Multiple valid solutions |
| Search/discovery | Find relevant files |
| Drafting | Code, docs, analyses |
| Debugging | Iterate against tool results |
| Summarizing | Compress evidence |
| Proposing | Candidate solutions |
The Mixed Zone
Many tasks are mixed. The pattern is:
- Symbolic logic defines the envelope (constraints + gates + budgets)
- Harness explores inside that envelope (implements, debugs, refines)
- Both can invoke symbolic rules (nothing is guesswork)
// Mixed: Harness works, orchestrator validates (loop-based retry)
let securityPassed = false;
for (let iteration = 0; iteration < maxIterations && !securityPassed; iteration++) {
const impl = await ctx.task(implementTask, {
feature,
constraints: {
allowedPaths: ['src/**'],
forbiddenPatterns: ['eval(', 'exec('],
maxFilesModified: 10
},
// Pass previous feedback on retry iterations
feedback: iteration > 0 ? lastSecurityResult.recommendations : null
});
// Orchestrator enforces gate
const securityResult = await ctx.task(securityGateTask, { impl });
securityPassed = securityResult.passed;
lastSecurityResult = securityResult;
}
The Four Guardrail Layers
Guardrails are a layered approach, not a single feature.
Layer A: Capability Guardrails (What's Possible)
Define what tools and actions exist.
const capabilityConfig = {
allowedTools: ['read', 'write', 'shell', 'search'],
pathRestrictions: ['src/**', 'tests/**'],
networkAccess: 'none',
permissions: 'read-write',
destructiveActions: 'require-confirmation'
};
Layer B: Budget Guardrails (How Far)
Prevent runaway execution.
const budgetConfig = {
maxToolCalls: 100,
maxWallClockMinutes: 30,
maxTokenSpend: 50000,
maxIterations: 10,
rateLimits: { apiCalls: '10/minute' }
};
Layer C: Policy Guardrails (What's Allowed)
Rules that define acceptable behavior.
const policyConfig = {
rules: [
'never exfiltrate secrets',
'never modify production directly',
'always run tests before merge',
'security scans required for dependencies'
]
};
Layer D: Behavioral Guardrails (How Decisions Are Made)
Structural consistency in outputs.
const behavioralConfig = {
requireStructuredOutputs: true,
requireEvidenceCitations: true,
requireUncertaintyDeclaration: true,
outputSchemas: { /* JSON schemas */ }
};
Quality Gates: Turning Agentic Work into Reliable Outcomes
Quality gates convert "it seems done" into "it is done."
The Evidence-Driven Pattern
Each phase must end with:
| Component | Description |
|---|---|
| Artifact | The work product (patch, doc, config, report) |
| Evidence | Proof it meets requirements (logs, test output, checks) |
If you don't have evidence, you don't have completion.
Common Gated Steps
| Gate Type | What It Validates |
|---|---|
| Unit tests | Individual functions work |
| Integration tests | Components work together |
| System tests | End-to-end behavior |
| Acceptance tests | User requirements met |
| Lint/formatting | Code style compliance |
| Type checking | Type safety |
| Static analysis | Potential bugs |
| Security scans | Vulnerabilities |
| Reproducibility | Clean run in fresh env |
| Diff review | No forbidden file changes |
| Performance | Meets thresholds |
Where Gates Live (Consistent Everywhere)
// In orchestrator: loop-based retry for gate failures
let gateResults = { passed: false };
for (let i = 0; i < maxIterations && !gateResults.passed; i++) {
const impl = await ctx.task(implementTask, { feature, feedback: gateResults.failures });
gateResults = await ctx.task(runGatesTask, { impl });
}
// As symbolic tool: harness pre-checks during work
const gateResult = await checkGate(impl);
if (!gateResult.passed) {
// Harness can immediately attempt repair
await repairIssues(gateResult.failures);
}
// As symbolic task: verify evidence objectively
const evidence = await ctx.task(gateValidatorTask, { impl });
Human Approval Gates
For high-impact steps, include explicit checkpoints:
// Plan approval before execution
await ctx.breakpoint({
question: 'Review the plan. Approve to proceed with implementation?',
title: 'Plan Approval',
context: { /* ... */ }
});
// Diff approval before merge
await ctx.breakpoint({
question: `Review the diff (${diff.linesChanged} lines). Approve to merge?`,
title: 'Merge Approval'
});
// Deployment approval
await ctx.breakpoint({
question: 'Quality: 92/100. Deploy to production?',
title: 'Production Deployment'
});
The Journal: Making Execution Testable
A journaled control plane turns agentic behavior into something you can:
| Capability | Value |
|---|---|
| Replay | Debug by re-running |
| Inspect | See exactly what happened |
| Diff | Compare across forks |
| Audit | Compliance evidence |
| Analyze | Failure pattern detection |
What's Journaled
| Event Type | Example |
|---|---|
| Inputs/signals | Initial requirements |
| Stage transitions | "planning" → "implementation" |
| Requested actions | writeFile('/src/auth.ts', ...) |
| Results | Action succeeded, 42 lines written |
| Artifacts | plan.md, implementation.patch |
| Evidence | Test results, gate outcomes |
| Gate outcomes | Security: PASS, Tests: PASS |
| Approvals | User approved at breakpoint |
Prompt Quality is Determinism Engineering
In a two-loop system, prompts are configuration for the harness.
Why Prompt Quality Matters
Better prompts reduce:
- Output variance
- Tool misuse
- Hidden assumptions
- Inconsistent formatting
- Unpredictable branching
Better prompts improve:
- Repeatability
- Debuggability
- Fork comparisons
- Safe automation
The Real Goal: Structural Consistency
You don't need identical wording. You need consistent:
- Decision formats
- Priorities
- Stop/ask conditions
- Evidence standards
Prompt Versioning
Treat harness prompts like engineering surfaces:
const promptVersion = '2.1.0';
const implementerPrompt = {
version: promptVersion,
role: 'senior software engineer',
task: 'Implement feature according to specification',
constraints: [
'Follow existing code patterns',
'Write tests for all public functions',
'Document complex logic',
'Ask for clarification if requirements are ambiguous'
],
outputFormat: {
type: 'object',
required: ['filesModified', 'summary', 'confidence']
}
};
Common Failure Modes and Fixes
1. Everything is Agentic
Symptom: Unpredictable behavior, hard to debug, inconsistent safety.
Fix: Move gates, budgets, and invariants into symbolic orchestration.
2. Everything is Symbolic
Symptom: Brittle workflows, poor adaptation, high maintenance.
Fix: Delegate fuzzy decisions and exploration to the harness.
3. Hidden State
Symptom: The harness "remembers" things the system never logged.
Fix: Journal what matters; the system's truth must be reconstructible.
4. Wide Tool Surface
Symptom: Tool confusion, increased risk, unpredictable results.
Fix: Keep tools small, stable, and well-described.
5. No Explicit Evidence Requirements
Symptom: "Done" claims without proof.
Fix: Define completion as artifact + evidence, enforced by gates.
The Doctrine
If you define only a few principles, make them these:
- The orchestrator owns run progression, journaling, and phase boundaries
- Symbolic logic owns constraints, permissions, budgets, and gates
- The harness owns adaptive work inside constraints
- Guardrails are enforced by symbolic checks, not informal intentions
- Quality is evidence-driven, not assertion-driven
- Prompts are versioned control surfaces for harness behavior
- The journal is the source of truth for replay, audit, and forking
Getting Started
If you're building from scratch:
- Define phases (a small symbolic process)
- Define effects/tools available in each phase
- Add budgets and permissions
- Decide quality gates per phase
- Add a harness that can do real work
- Journal everything needed for replay and audit
- Add fork + time travel as first-class operations
If you do only one thing: make completion require evidence.
Process Library Examples
Spec-Driven Development
methodologies/spec-driven-development.js
Implements the full two-loops pattern:
- Symbolic: Constitution validation, plan-constitution alignment, consistency analysis
- Agentic: Specification writing, planning, implementation
- Gates: Every phase has approval breakpoints
V-Model
methodologies/v-model.js
Heavy on symbolic verification:
- Four test levels designed before implementation
- Traceability matrix ensures complete coverage
- Safety levels adjust rigor
GSD Iterative Convergence
gsd/iterative-convergence.js
Feedback-driven quality loop:
- Implement → Score → Feedback → Repeat
- Breakpoints at quality thresholds
- Plateau detection for early exit
Related Documentation
- Quality Convergence - Five quality gate types and 90-score pattern
- Best Practices - Workflow design and guardrail patterns
- Process Definitions - Creating your own processes
- Journal System - Event sourcing and replay
- Breakpoints - Human-in-the-loop approval
Summary
The Two-Loops architecture enables bounded, testable autonomy:
- Orchestration Loop provides control, safety, and traceability
- Agentic Loop provides capability, adaptation, and problem-solving
- Quality Gates turn "seems done" into "is done" with evidence
- Guardrails enforce rules at capability, budget, policy, and behavioral levels
- Journaling makes everything replayable and auditable
When done well, you get autonomy that is bounded, testable, and steadily improvable.
SDK API Quick Reference
The complete list of SDK intrinsics (functions available on ctx):
| Function | Purpose | Example |
|---|---|---|
ctx.task(taskDef, args) | Execute a task | await ctx.task(buildTask, { target: 'dist' }) |
ctx.breakpoint(opts) | Pause for human approval | await ctx.breakpoint({ question: 'Deploy?', title: 'Approval' }) |
ctx.parallel.all([...]) | Run tasks in parallel | await ctx.parallel.all([() => ctx.task(a), () => ctx.task(b)]) |
ctx.parallel.map(arr, fn) | Map over array in parallel | await ctx.parallel.map(files, f => ctx.task(lint, { file: f })) |
ctx.sleepUntil(iso8601) | Pause until a specific time | await ctx.sleepUntil('2026-01-27T10:00:00Z') |
ctx.log(msg, data?) | Log message to journal | ctx.log('Quality score', { score: 85 }) |
ctx.now() | Get current time (deterministic) | const ts = ctx.now().getTime() |
ctx.runId | Current run identifier | const id = ctx.runId |
Important: There is NO ctx.retry(). Use loops for retry logic:
// Correct: Loop-based retry
for (let i = 0; i < maxIterations && !passed; i++) {
const result = await ctx.task(implementTask, { feedback });
passed = result.testsPass;
feedback = result.errors;
}
What To Do Next
Based on your role, here's your next step:
| If you are... | Do this next |
|---|---|
| Beginner | Read Quality Convergence for the core iteration pattern |
| Building processes | Study Best Practices for workflow design |
| Debugging a run | Check Journal System to understand event sourcing |
| Adding approvals | See Breakpoints for human-in-the-loop patterns |
| Evaluating for team | Review the Four Guardrail Layers section above |