Frequently Asked Questions (FAQ)

Version: 1.1 Last Updated: 2026-01-26 Last refreshed: 2026-04-25 Category: Reference

Question	Quick Answer
What does Babysitter actually do?	It automates the "try → test → fix → repeat" loop until your code meets quality targets
How do I start?	Just type `/babysitter:call build a login page` in Claude Code
Do I need to write code?	No - you use natural language. Babysitter handles the rest
What if something goes wrong?	Everything is saved automatically. You can always resume or debug
Is it free?	Babysitter is included with Claude Code - no additional cost

Getting Started

What is Babysitter?

Babysitter is an event-sourced orchestration framework for Claude Code that enables deterministic, resumable, and human-in-the-loop workflow management. It allows you to build complex, multi-step development processes with built-in quality gates, human approval checkpoints, and automatic iteration until quality targets are met.

Key features:

Structured multi-step workflows
Human approval checkpoints (breakpoints)
Iterative quality convergence
Complete audit trails
Session persistence and resumability

See: README

What is the difference between Babysitter and regular Claude Code?

Feature	Regular Claude Code	With Babysitter
Session persistence	Lost on restart	Event-sourced, resumable
Quality iteration	Manual prompting	Automated convergence
Approval gates	Chat-based	Structured breakpoints
Parallel execution	Sequential only	Built-in parallelism
Audit trail	Chat history	Structured journal

Babysitter adds orchestration capabilities, enabling deterministic workflows with full traceability.

Do I need programming knowledge to use Babysitter?

No. You interact with Babysitter using natural language. Simply ask Claude to use the babysitter skill:

Use the babysitter skill to implement user authentication with TDD

Or use the slash command:

/babysitter:call implement user authentication with TDD

However, creating custom process definitions does require JavaScript/TypeScript knowledge.

See: Getting Started

Can I use Babysitter with other AI tools?

Babysitter is specifically designed for Claude Code. The orchestration framework integrates with Claude Code's plugin system and skill infrastructure. While the underlying concepts could be adapted, Babysitter is not currently directly compatible with other AI coding assistants.

What types of tasks is Babysitter best suited for?

Babysitter excels at:

Feature development with TDD and quality gates
Code refactoring with iterative improvement
Multi-phase workflows requiring human approval
Complex tasks spanning multiple files or components
Team workflows requiring audit trails and approvals

For simple, one-off tasks, using Claude Code directly may be faster.

Installation and Setup

What are the prerequisites for Babysitter?

Required:

Node.js 20.0.0+ (recommend 22.x LTS)
Claude Code (latest version)
npm 8.0.0+

Optional:

Git (for version control)
jq (for CLI output parsing)

See: Installation Guide

Why are there multiple Babysitter npm packages?

Babysitter exposes a few public packages with different roles:

@a5c-ai/babysitter - Recommended end-user install for the babysitter CLI
@a5c-ai/babysitter-sdk - SDK/library package and the implementation behind the core CLI
@a5c-ai/babysitter-agent - Optional runtime CLI for call, resume, plan, doctor, start-server, and tui

Most users install:

npm install -g @a5c-ai/babysitter@latest

Add the optional runtime CLI only if you need those agent runtime commands:

npm install -g @a5c-ai/babysitter-agent@latest

How much disk space does Babysitter use?

The .a5c/runs/ directory stores all run data:

Light usage: 1-5 MB per run
Heavy usage: 50-100 MB per run (with large artifacts)

Monitor disk usage:

du -sh .a5c/
du -h .a5c/runs/* | sort -h

You can safely delete old runs to reclaim space:

rm -rf .a5c/runs/<old-run-id>

How do I update Babysitter?

Update SDK packages:

npm update -g @a5c-ai/babysitter @a5c-ai/babysitter-agent

Update Claude Code plugin:

claude plugin marketplace update a5c.ai
claude plugin update babysitter@a5c.ai

Tip: Update regularly (daily or weekly) for the latest features and fixes.

Why doesn't the babysit skill appear in /skills?

Common causes and solutions:

Plugin not installed:

claude plugin marketplace add a5c-ai/babysitter-claude
claude plugin install --scope user babysitter@a5c.ai

Plugin not enabled:

claude plugin enable --scope user babysitter@a5c.ai

Claude Code not restarted:
- Close all Claude Code windows
- Reopen Claude Code
Verify installation:
```
claude plugin list | grep babysitter
```

See: Installation Troubleshooting

Using Babysitter

How do I start a new babysitter run?

Via natural language:

Use the babysitter skill to implement user authentication

Via slash command:

/babysitter:call implement user authentication with TDD

With options:

/babysitter:call implement user authentication --max-iterations 10

See: Quickstart

How do I pause a babysitter run?

Simply close Claude Code. The run is automatically saved to the event-sourced journal and can be resumed later.

Babysitter is designed to be resumable at any point.

How do I resume a paused run?

Via natural language:

Resume the babysitter run for the authentication feature

Via slash command with run ID:

/babysitter:call resume --run-id 01KFFTSF8TK8C9GT3YM9QYQ6WG

Find your run ID:

ls -lt .a5c/runs/ | head -10

See: Run Resumption

How do I find my run ID?

The run ID is displayed when you start a workflow:

Run ID: 01KFFTSF8TK8C9GT3YM9QYQ6WG
Run Directory: .a5c/runs/01KFFTSF8TK8C9GT3YM9QYQ6WG/

Ask Claude to find recent runs:

What babysitter runs have I done recently?

Check run status:

What's the status of my babysitter run?

Can I run multiple babysitter processes simultaneously?

Not recommended. Running multiple babysitter instances in the same directory may cause journal conflicts.

For parallel work:

Use separate directories
Use separate runs for independent features
Wait for one run to complete before starting another in the same directory

What happens if a task fails?

When a task fails, Babysitter:

Records the failure in the journal
May retry based on configuration
Reports the error for debugging

To investigate:

babysitter run:events <runId> --filter-type RUN_FAILED --json

To resume after fixing:

/babysitter:call resume

How do I debug a failed run?

Check the journal:

cat .a5c/runs/<runId>/journal/journal.jsonl | jq .

View recent events:

babysitter run:events <runId> --limit 10 --reverse --json

Find the error:

babysitter run:events <runId> --filter-type RUN_FAILED --json

Ask Claude to analyze:

Analyze the babysitter run error for <runId> and diagnose

See: Troubleshooting Guide

Quality Convergence

What is a quality score?

Quality scores are assessments of code quality generated by Babysitter's agent tasks. Scores are based on:

Test coverage
Test quality
Code quality metrics (lint, types)
Security analysis
Requirements alignment

Scores range from 0-100.

See: Quality Convergence

How do I set a quality target?

Include it in your prompt:

Use babysitter with TDD and 85% quality target

Or specify in process inputs:

const { targetQuality = 85, maxIterations = 5 } = inputs;

What if quality score doesn't improve?

Common causes:

Target is unrealistic
Fundamental issues blocking improvement
Scoring criteria too strict

Solutions:

Review iteration feedback:

What recommendations came from my quality scoring?

Lower the target:

/babysitter:call continue with 75% quality target

Increase iterations:

/babysitter:call continue with max 10 iterations

Review blocking issues: Check lint errors, test failures, etc.

Can I customize quality scoring?

Yes. Create custom agent tasks with your scoring criteria:

export const customScoringTask = defineTask('custom-scorer', (args, taskCtx) => ({
  kind: 'agent',
  title: 'Custom quality scoring',
  agent: {
    name: 'quality-assessor',
    prompt: {
      role: 'quality engineer',
      task: 'Score based on our team standards',
      instructions: [
        'Your custom criteria here',
        '...'
      ]
    }
  }
}));

How many iterations are typical?

Workflow Type	Typical Iterations	Maximum Recommended
Simple improvement	2-3	5
Feature development	3-5	10
Complex refactoring	5-8	15

Always set iteration limits to prevent runaway loops.

Sessions and Resumption

How does session resumption work?

Babysitter uses event sourcing:

Every action is recorded in the journal
On resume, events are replayed to rebuild state
Completed tasks return cached results
Execution continues from the last pending point

This makes sessions fully resumable regardless of why they ended.

See: Run Resumption

Is progress lost if Claude Code crashes?

No. All progress is preserved in the journal. Resume with:

/babysitter:call resume --run-id <runId>

Can different team members continue a run?

Yes. Runs are stored in the file system and can be continued by anyone with access:

# Developer A starts
/babysitter:call implement feature X

# Developer B continues
/babysitter:call resume the feature X workflow

Ensure you share the .a5c/ directory (e.g., via Git or shared storage).

What happens to pending breakpoints on resume?

Pending breakpoints are preserved. On resume:

Babysitter detects the pending breakpoint
Checks if it has been approved
If approved, continues; if not, waits

Approve breakpoints before resuming, or resume and check the breakpoints UI.

Can I route breakpoints to specific team members?

Yes. Use the expert field to route a breakpoint to specific reviewers:

await ctx.breakpoint({
  question: 'Approve the security changes?',
  expert: ['security-lead', 'tech-lead'],  // Route to specific experts
  strategy: 'quorum',                       // Require majority approval
  tags: ['security'],
});

Supported strategies: single (default), first-response-wins, collect-all, quorum. The result includes respondedBy (who responded) and allResponses (for multi-reviewer strategies).

See: Breakpoints

What happens when a breakpoint is rejected?

Breakpoints should never fail a process. The recommended pattern is to loop and retry with the reviewer's feedback:

let approved = false;
let previousFeedback;
let attempt = 0;

while (!approved) {
  attempt++;
  const review = await ctx.breakpoint({
    question: 'Approve the plan?',
    previousFeedback,
    attempt,
  });
  if (review.approved) {
    approved = true;
  } else {
    previousFeedback = review.feedback;
    // Refine work based on feedback before retrying
  }
}

See: Breakpoints - Robust Rejection Pattern

Process Definitions

What is a process definition?

A process definition is a JavaScript function that orchestrates workflow logic. It defines:

What tasks to run
In what order
With what conditions
Where to place breakpoints

export async function process(inputs, ctx) {
  const plan = await ctx.task(planTask, { feature: inputs.feature });
  const review = await ctx.breakpoint({ question: 'Approve plan?' });
  if (!review.approved) return { success: false, feedback: review.feedback };
  const result = await ctx.task(implementTask, { plan });
  return result;
}

Note: ctx.breakpoint() returns a BreakpointResult with { approved, response?, feedback?, option? }. Existing code that ignores the return value continues to work.

See: Process Definitions

Can I edit a process definition for a running workflow?

Not recommended. Process definitions are associated with runs at creation time. Modifying them during execution may cause unexpected behavior.

For changes, start a new run with the updated process.

What task types are available?

Type	Use Case	Example
Agent	LLM-powered tasks	Planning, scoring
Skill	Claude Code skills	Code analysis
Node	JavaScript scripts	Build, test
Shell	Commands	git, npm
Breakpoint	Human approval	Review gates

Can I run tasks in parallel?

Yes, using ctx.parallel.all():

const [coverage, lint, security] = await ctx.parallel.all([
  () => ctx.task(coverageTask, {}),
  () => ctx.task(lintTask, {}),
  () => ctx.task(securityTask, {})
]);

This significantly speeds up workflows with independent tasks.

See: Parallel Execution

Performance and Optimization

How long do babysitter runs typically take?

Workflow Type	Expected Duration
Simple build & test	30s - 2m
TDD feature	3m - 10m
Complex refactoring	10m - 30m
Full application	30m - 2h

Duration depends on iteration count, task complexity, and API latency.

How can I speed up my workflows?

Use parallel execution:

await ctx.parallel.all([task1, task2, task3]);

Set iteration limits:
```
Use babysitter with max 3 iterations
```

Reduce agent task scope:

await ctx.task(analyzeTask, { files: ['specific/file.js'] });

Lower quality target for faster convergence

What affects execution time the most?

LLM API latency - 2-5 seconds per agent call
Iteration count - More iterations = longer runtime
Task complexity - Large codebases take longer
Parallel vs sequential - Parallel can be 4x faster

How do I monitor a running workflow?

Ask Claude for updates:

What's the current progress of my babysitter run?

Show me the recent events in my workflow

How many iterations have completed?

Security and Compliance

Is my code sent to external services?

Agent tasks use Claude's API, which means:

Code context is sent to the API for analysis
No data is stored by the API beyond the session
Review Anthropic's privacy policy for details

For sensitive code, consider:

Using shell/node tasks instead of agent tasks
Running analysis locally
Reviewing what context is sent to agents

Are credentials safe in Babysitter workflows?

Best practices:

Use environment variables:
```
const apiKey = process.env.API_KEY;
```
Never hardcode credentials
Add .a5c/ to .gitignore

Review journal before sharing:

grep -i "password\|secret\|key" .a5c/runs/*/journal/*.json

See: Security Best Practices

Does Babysitter provide audit trails?

Yes. The journal system records:

Every task execution
Every breakpoint approval/rejection
Every state change
Complete timestamps

Export audit trail:

jq '.' .a5c/runs/<runId>/journal/*.json > audit-report.json

See: Journal System

Should I commit the .a5c directory to Git?

Generally, no. Add to .gitignore:

.a5c/

Reasons:

May contain sensitive data
Can grow large
State cache is derived, not source

However, you may choose to commit journals for audit purposes if they don't contain sensitive information.

Troubleshooting FAQ

Where can I find error logs?

Ask Claude to show you the relevant information:

Journal events:
```
Show me the events in my babysitter run
```

Task outputs:

What was the result of the last task in my workflow?

Run state:

What's the current state of my babysitter run?

See: Troubleshooting Guide

Why does my run say "waiting" but nothing happens?

Likely cause: A breakpoint is pending approval.

Solution:

Check breakpoints UI: http://localhost:3184
Review and approve/reject the breakpoint
Resume if needed

Verify:

Are there any pending breakpoints in my babysitter run?

Why is the breakpoints UI not accessible?

The breakpoints UI is integrated into the SDK and starts automatically when a workflow reaches a breakpoint.

Check if accessible:

curl http://localhost:3184/health

If not accessible:

Ensure a workflow with breakpoints is running
The UI starts automatically when a breakpoint is reached
Check if another process is using port 3184:
```
lsof -i :3184
```

If port is in use: Kill the conflicting process or configure a different port in your SDK settings.

How do I report a bug?

Gather information:
- OS and version
- Node.js version
- Claude Code version
- Babysitter CLI version
- Full error message
- Relevant journal excerpts
Search existing issues: GitHub Issues
Create a new issue: Include all gathered information and steps to reproduce.

Need More Help?

Troubleshooting Guide - Detailed problem-solution reference
Error Catalog - Common error messages explained
GitHub Issues - Report bugs
GitHub Discussions - Ask questions

Document Status: Complete Last Updated: 2026-01-25

Top 5 Questions (Start Here)​

Table of Contents​

Getting Started​

What is Babysitter?​

What is the difference between Babysitter and regular Claude Code?​

Do I need programming knowledge to use Babysitter?​

Can I use Babysitter with other AI tools?​

What types of tasks is Babysitter best suited for?​

Installation and Setup​

What are the prerequisites for Babysitter?​

Why are there multiple Babysitter npm packages?​

How much disk space does Babysitter use?​

How do I update Babysitter?​

Why doesn't the babysit skill appear in /skills?​

Using Babysitter​

How do I start a new babysitter run?​

How do I pause a babysitter run?​

How do I resume a paused run?​

How do I find my run ID?​

Can I run multiple babysitter processes simultaneously?​

What happens if a task fails?​

How do I debug a failed run?​

Quality Convergence​

What is a quality score?​

How do I set a quality target?​

What if quality score doesn't improve?​

Can I customize quality scoring?​

How many iterations are typical?​

Sessions and Resumption​

How does session resumption work?​

Is progress lost if Claude Code crashes?​

Can different team members continue a run?​

What happens to pending breakpoints on resume?​

Can I route breakpoints to specific team members?​

What happens when a breakpoint is rejected?​

Process Definitions​

What is a process definition?​

Can I edit a process definition for a running workflow?​

What task types are available?​

Can I run tasks in parallel?​

Performance and Optimization​

How long do babysitter runs typically take?​

How can I speed up my workflows?​

What affects execution time the most?​

How do I monitor a running workflow?​

Security and Compliance​

Is my code sent to external services?​

Are credentials safe in Babysitter workflows?​

Does Babysitter provide audit trails?​

Should I commit the .a5c directory to Git?​

Troubleshooting FAQ​

Where can I find error logs?​

Why does my run say "waiting" but nothing happens?​

Why is the breakpoints UI not accessible?​

How do I report a bug?​

Need More Help?​

Top 5 Questions (Start Here)

Table of Contents