Skip to main content

Using the Babysitter GitHub Action

This guide explains how to use the official Babysitter GitHub Action (a5c-ai/babysitter@main) for automated, AI-orchestrated workflows in your CI/CD pipeline.

Overview

The Babysitter action is harness-agnostic — it supports multiple AI harnesses (pi, claude-code, codex, gemini-cli, and more) through a single action interface. It builds the local babysitter-agent runtime from the action repository, resolves credentials for the selected provider, and runs babysitter-agent yolo to execute deterministic, event-sourced orchestration with quality gates, iterative refinement, and multi-step process management.

Unlike harness-specific actions (anthropics/claude-code-action, openai/codex-action, google-github-actions/run-gemini-cli), this action lets you switch harnesses with a single input change while keeping the same workflow structure.

Quick Start

Basic Setup

name: Babysitter

on:
issue_comment:
types: [created]
issues:
types: [opened, assigned]

jobs:
babysitter:
if: |
(github.event_name == 'issue_comment' && contains(github.event.comment.body, '@babysitter')) ||
(github.event_name == 'issues' && contains(github.event.issue.body, '@babysitter'))
runs-on: ubuntu-latest
permissions:
contents: write
pull-requests: write
issues: write
id-token: write
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
fetch-depth: 1

- name: Run Babysitter
uses: a5c-ai/babysitter@main
with:
prompt: |
${{ github.event.comment.body || github.event.issue.body }}
anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}

With a Specific Harness and Model

- name: Run Babysitter
uses: a5c-ai/babysitter@main
with:
prompt: 'Review this PR for code quality and security issues'
harness: codex
model: openai:gpt-4.1
openai-api-key: ${{ secrets.OPENAI_API_KEY }}

Using the Internal Harness (Default)

The internal harness uses a programmatic pi-coding-agent — no external CLI needed. It reads credentials from environment variables and resolves models via the provider:modelId pattern:

- name: Run Babysitter
uses: a5c-ai/babysitter@main
with:
prompt: 'Implement the feature described in this issue'
model: anthropic:claude-sonnet-4-20250514
anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}

Authentication

Provider Credentials

Pass the API key matching your model's provider:

ProviderInputEnvironment Variable
Anthropicanthropic-api-keyANTHROPIC_API_KEY
OpenAIopenai-api-keyOPENAI_API_KEY
Google Geminigemini-api-keyGEMINI_API_KEY
Azure OpenAIazure-openai-api-keyAZURE_OPENAI_API_KEY

Azure OpenAI Configuration

Azure requires additional configuration beyond the API key:

- uses: a5c-ai/babysitter@main
with:
prompt: 'Your task'
model: azure-openai-responses:gpt-4.1
azure-openai-api-key: ${{ secrets.AZURE_OPENAI_API_KEY }}
azure-openai-project-name: ${{ vars.AZURE_OPENAI_PROJECT_NAME }}
azure-openai-deployment: ${{ vars.AZURE_OPENAI_DEPLOYMENT }}
azure-openai-base-url: ${{ vars.AZURE_OPENAI_BASE_URL }}

Model Selection

Models use the provider:modelId format. The provider determines which API key is required:

# Anthropic
model: anthropic:claude-sonnet-4-20250514
model: anthropic:claude-opus-4-5

# OpenAI
model: openai:gpt-4.1
model: openai:o4-mini

# Azure OpenAI
model: azure-openai-responses:gpt-4.1

# Bare model ID (auto-detected from available credentials)
model: claude-sonnet-4-20250514

Configuration Options

Action Inputs

InputRequiredDefaultDescription
promptNoTask prompt. GitHub context vars available via shell expansion.
harnessNointernalAI harness: internal, pi, claude-code, codex, gemini-cli
modelNoModel in provider:modelId format
process-pathNoPath to process definition (skips Phase 1)
workspaceNo$GITHUB_WORKSPACEWorking directory
max-iterationsNo256Max orchestration iterations
runs-dirNo~/.a5c/runsRun state directory
timeout-minutesNo30Step timeout
verboseNofalseEnable debug output
anthropic-api-keyNoAnthropic API key
openai-api-keyNoOpenAI API key
gemini-api-keyNoGoogle Gemini API key
azure-openai-api-keyNoAzure OpenAI API key
azure-openai-project-nameNoAzure resource name
azure-openai-deploymentNoAzure deployment name
azure-openai-base-urlNoAzure endpoint URL
github-tokenNo${{ github.token }}GitHub API token

Action Outputs

OutputDescription
run-idBabysitter run ID
run-dirPath to the run directory
statusFinal status: completed, failed, or unknown
iterationsNumber of orchestration iterations

Workflow Examples

Issue Comment Handler

name: Babysitter Issue Handler

on:
issue_comment:
types: [created]
pull_request_review_comment:
types: [created]
issues:
types: [opened, assigned]

jobs:
babysitter:
if: |
(github.event_name == 'issue_comment' && contains(github.event.comment.body, '@babysitter')) ||
(github.event_name == 'pull_request_review_comment' && contains(github.event.comment.body, '@babysitter')) ||
(github.event_name == 'issues' && contains(github.event.issue.body, '@babysitter'))
runs-on: ubuntu-latest
permissions:
contents: write
pull-requests: write
issues: write
id-token: write
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 1

- uses: a5c-ai/babysitter@main
with:
prompt: ${{ github.event.comment.body || github.event.issue.body }}
anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}

PR Review

name: Babysitter PR Review

on:
pull_request:
types: [opened, synchronize, ready_for_review, reopened]

jobs:
review:
runs-on: ubuntu-latest
permissions:
contents: read
pull-requests: write
id-token: write
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 1

- uses: a5c-ai/babysitter@main
with:
prompt: |
Review PR #${{ github.event.pull_request.number }} in ${{ github.repository }}.
Analyze for code quality, security vulnerabilities, performance, and test coverage.
harness: internal
model: anthropic:claude-sonnet-4-20250514
anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}

Feature Development (TDD)

name: Babysitter TDD Feature

on:
issues:
types: [labeled]

jobs:
develop:
if: github.event.label.name == 'feature-request'
runs-on: ubuntu-latest
timeout-minutes: 60
permissions:
contents: write
pull-requests: write
issues: write
id-token: write
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0

- uses: a5c-ai/babysitter@main
with:
prompt: |
Implement the feature described in issue #${{ github.event.issue.number }}
in ${{ github.repository }} using TDD methodology.
Write failing tests first, implement, refactor, iterate until quality threshold is met.
Create a PR when complete.
max-iterations: '50'
timeout-minutes: '60'
anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}

GSD Quick Tasks

name: Babysitter GSD

on:
issue_comment:
types: [created]

jobs:
gsd:
if: contains(github.event.comment.body, '/gsd')
runs-on: ubuntu-latest
timeout-minutes: 15
permissions:
contents: write
pull-requests: write
issues: write
id-token: write
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 1

- uses: a5c-ai/babysitter@main
with:
prompt: |
Use GSD methodology for rapid implementation:
${{ github.event.comment.body }}
max-iterations: '20'
timeout-minutes: '15'
anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}

Spec-Driven Development

name: Babysitter Spec-Kit

on:
workflow_dispatch:
inputs:
spec_file:
description: 'Path to specification file'
required: true
type: string

jobs:
implement:
runs-on: ubuntu-latest
timeout-minutes: 60
permissions:
contents: write
pull-requests: write
id-token: write
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0

- uses: a5c-ai/babysitter@main
with:
prompt: |
Implement the specification at ${{ inputs.spec_file }} using Spec-Kit methodology.
Parse the spec, generate an implementation plan, implement with continuous validation,
and generate a compliance report.
timeout-minutes: '60'
anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}

Security Scanning

name: Babysitter Security

on:
workflow_dispatch:
schedule:
- cron: '0 0 * * 1'

jobs:
security:
runs-on: ubuntu-latest
timeout-minutes: 30
permissions:
contents: write
pull-requests: write
security-events: write
id-token: write
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0

- uses: a5c-ai/babysitter@main
with:
prompt: |
Run a comprehensive security scan of ${{ github.repository }}.
Perform SAST analysis, dependency vulnerability scanning, and secret detection.
Report findings and create a PR with fixes where possible.
timeout-minutes: '30'
anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}

Architecture Documentation

name: Babysitter Architecture Docs

on:
workflow_dispatch:
push:
paths: ['src/**', 'packages/**']

jobs:
docs:
runs-on: ubuntu-latest
timeout-minutes: 30
permissions:
contents: write
pull-requests: write
id-token: write
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0

- uses: a5c-ai/babysitter@main
with:
prompt: |
Generate architecture documentation for ${{ github.repository }}.
Create C4 model diagrams, system overview, data flow docs, and API docs.
Update the docs/ directory and create a PR.
timeout-minutes: '30'
anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}

Multi-Harness Example (Codex)

- uses: a5c-ai/babysitter@main
with:
prompt: 'Implement and test the feature'
harness: codex
model: openai:o4-mini
openai-api-key: ${{ secrets.OPENAI_API_KEY }}

Multi-Harness Example (Gemini CLI)

- uses: a5c-ai/babysitter@main
with:
prompt: 'Analyze the codebase and generate documentation'
harness: gemini-cli
gemini-api-key: ${{ secrets.GEMINI_API_KEY }}

Environment Variables

Configure babysitter behavior via environment variables on the workflow step:

- uses: a5c-ai/babysitter@main
with:
prompt: 'Your task'
anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}
env:
BABYSITTER_MAX_ITERATIONS: 50
BABYSITTER_QUALITY_THRESHOLD: 85
BABYSITTER_LOG_LEVEL: debug
BABYSITTER_TIMEOUT: 180000
VariableDefaultDescription
BABYSITTER_MAX_ITERATIONS256Max orchestration iterations
BABYSITTER_QUALITY_THRESHOLD80Quality gate threshold (0-100)
BABYSITTER_LOG_LEVELinfoLogging: info, debug, warn, error
BABYSITTER_TIMEOUT120000Operation timeout in ms
BABYSITTER_RUNS_DIR~/.a5c/runsRun state directory override
BABYSITTER_RUNS_SCOPEglobalSet to repo to keep runs under <repo>/.a5c/runs

Artifacts and Outputs

Run Artifacts

The action automatically uploads babysitter run artifacts (journals, task results, blobs) as a GitHub Actions artifact named babysitter-runs. Configure retention:

# Override in your workflow after the babysitter step
- uses: actions/upload-artifact@v4
if: always()
with:
name: babysitter-runs
path: ~/.a5c/runs/
retention-days: 14

Using Outputs

- uses: a5c-ai/babysitter@main
id: babysitter
with:
prompt: 'Your task'
anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}

- name: Check results
run: |
echo "Run ID: ${{ steps.babysitter.outputs.run-id }}"
echo "Status: ${{ steps.babysitter.outputs.status }}"
echo "Iterations: ${{ steps.babysitter.outputs.iterations }}"

Prompt Templates

The prompt input supports GitHub context variables via shell expansion. Common variables:

VariableDescription
${{ github.repository }}Owner/repo
${{ github.event_name }}Event type
${{ github.event.issue.number }}Issue number
${{ github.event.pull_request.number }}PR number
${{ github.event.comment.body }}Comment text
${{ github.sha }}Commit SHA
${{ github.ref }}Git ref
${{ github.actor }}Triggering user

Troubleshooting

Babysitter Agent Runtime Build Failed

The action builds babysitter-agent from the action repository. If it fails, check Node.js availability:

- uses: actions/setup-node@v4
with:
node-version: '22'

Authentication Errors

Ensure the correct API key secret is set for your harness/model:

  • internal or claude-code with Anthropic models → ANTHROPIC_API_KEY
  • codex or OpenAI models → OPENAI_API_KEY
  • gemini-cliGEMINI_API_KEY
  • Azure models → AZURE_OPENAI_API_KEY plus project/deployment/URL

Runs Timing Out

  1. Reduce max-iterations (e.g., 50)
  2. Increase timeout-minutes
  3. Use a simpler methodology (GSD vs TDD)
  4. Break the task into smaller pieces

No Output / Empty Results

Enable verbose mode for debugging:

- uses: a5c-ai/babysitter@main
with:
prompt: 'Your task'
verbose: 'true'
anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}

Reference

Babysitter

Harness-Specific Actions

Marketplace Plugins