Quality Convergence: Iterative Improvement Until Targets Met

Version: 2.1 Last Updated: 2026-01-26 Category: Feature Guide

Quick Summary (Read This First)

Quality Convergence = "Keep trying until it's good enough"

Instead of:

AI writes code → Tests fail → You manually fix → Tests fail again → Repeat 10x

Babysitter does:

AI writes code → Tests: 60% pass → AI fixes → Tests: 85% pass → AI fixes → Tests: 95% pass ✓ Done!

What You'll Learn in This Document

Section	What It Covers	Read If You Want To...
Five Quality Gates	Types of checks (tests, lint, security, etc.)	Understand what gets checked
90-Score Pattern	How to reliably hit high quality	Build production-ready workflows
Process Examples	Real code from the library	See working implementations
Step-by-Step	How to build your own	Create custom quality loops

A Simple Example

Here's what quality convergence looks like in practice:

Iteration 1:
  - AI writes login feature
  - Tests run: 3/10 passing (30%)
  - AI sees: "Missing password validation, no error handling"

Iteration 2:
  - AI fixes based on feedback
  - Tests run: 7/10 passing (70%)
  - AI sees: "Edge case for empty email not handled"

Iteration 3:
  - AI fixes edge cases
  - Tests run: 10/10 passing (100%)
  - Quality target met! ✓

Output: Working login feature with all tests passing

Key insight: The AI doesn't just try once - it learns from each failure and improves.

Understanding Quality Scores

Quality scores are multi-dimensional, not a single number. This is what makes Babysitter's quality convergence so accurate - instead of a simple pass/fail, you get nuanced feedback across multiple dimensions that guide improvement.

A typical quality score includes:

Dimension	What It Measures	Example
Tests	Pass rate and coverage	92% tests passing, 85% coverage
Code Quality	Lint errors, complexity	0 lint errors, complexity < 10
Security	Vulnerabilities, secrets	0 critical issues
Performance	Response time, bundle size	p95 < 500ms
Type Safety	Type errors, null safety	0 type errors

The Power of Custom Dimensions

You define what quality means for your project. The dimensions above are just examples - you can:

Define your own 5 dimensions that matter most for your domain
Ask Babysitter to suggest dimensions appropriate for your specific task
Weight dimensions differently based on project phase or criticality

For example, a data pipeline might use completely different dimensions:

Dimension	Weight	Threshold
Data Accuracy	30%	> 99.9%
Processing Speed	25%	< 5 min/GB
Schema Validation	20%	100% valid
Idempotency	15%	All operations idempotent
Error Recovery	10%	Auto-recovery < 30s

This flexibility means quality convergence adapts to any domain - from ML model training to infrastructure deployment to documentation generation.

For detailed scoring formulas and weight configurations, see Best Practices - Custom Scoring Strategies.

Overview

Quality convergence is an iterative improvement pattern where Babysitter repeatedly refines work until a defined quality target is achieved. Instead of executing a task once and hoping for the best, quality convergence loops through implementation, testing, and scoring cycles until the output meets your standards.

The Core Principle: Evidence-Driven Completion

From the Two-Loops Control Plane architecture, the fundamental principle is:

If you don't have evidence, you don't have completion.

If you do only one thing: make completion require evidence. — This single principle transforms "it seems done" into "it is done."

Every phase must end with:

Artifact: The work product (patch, doc, config, report)
Evidence: Proof that it meets requirements (logs, test output, checks)

Why Use Quality Convergence

Consistent Quality: Guarantee outputs meet minimum quality thresholds
Automated Refinement: Let the system iterate without manual intervention
Measurable Results: Track quality scores across iterations
Predictable Outcomes: Set clear targets and iteration limits
TDD Integration: Combine with test-driven development for robust code
Evidence-Based Completion: Every iteration produces verifiable proof of quality

The Five Quality Gate Categories

Quality gates are not a single check. They form a layered validation system that ensures completeness from multiple perspectives. For robust quality convergence, use 4-5 gate types simultaneously.

Gate Type 1: Functional Tests (Unit/Integration/System/Acceptance)

Verifies the code behaves correctly across all levels.

// From: methodologies/v-model.js (V-Model process)
const testResults = await ctx.task(executeTestsTask, {
  implementation,
  unitTestDesigns,      // Validates module design
  integrationTestDesign, // Validates architecture
  systemTestDesign,      // Validates system design
  acceptanceTestDesign   // Validates requirements
});

const allTestsPassed =
  testResults.unitTests.passed &&
  testResults.integrationTests.passed &&
  testResults.systemTests.passed &&
  testResults.acceptanceTests.passed;

Gate Criteria:

Test Level	What It Validates	Typical Pass Threshold
Unit Tests	Individual functions/classes	90-100% pass rate
Integration Tests	Module interactions	95-100% pass rate
System Tests	End-to-end behavior	90-100% pass rate
Acceptance Tests	User requirements	100% for critical

Gate Type 2: Code Quality (Lint/Format/Complexity)

Ensures code follows style guidelines and maintainability standards.

// Parallel code quality checks
const [lint, format, complexity] = await ctx.parallel.all([
  () => ctx.task(lintTask, { files: impl.filesModified }),
  () => ctx.task(formatCheckTask, { files: impl.filesModified }),
  () => ctx.task(complexityTask, { files: impl.filesModified })
]);

const codeQualityGatePassed =
  lint.errorCount === 0 &&
  format.violations === 0 &&
  complexity.maxCyclomaticComplexity < 10;

Gate Criteria:

Check	Tool Examples	Typical Threshold
Lint Errors	ESLint, Pylint	0 errors
Formatting	Prettier, Black	0 violations
Cyclomatic Complexity	SonarQube, Radon	< 10 per function
Code Duplication	jscpd, CPD	< 3% duplication

Gate Type 3: Type Safety and Static Analysis

Catches bugs at compile/analysis time without running the code.

// From: gsd/iterative-convergence enhanced pattern
const [typeCheck, staticAnalysis] = await ctx.parallel.all([
  () => ctx.task(typeCheckTask, { files: impl.filesModified }),
  () => ctx.task(staticAnalysisTask, { files: impl.filesModified })
]);

const staticGatePassed =
  typeCheck.errors.length === 0 &&
  staticAnalysis.criticalIssues === 0 &&
  staticAnalysis.highIssues === 0;

Gate Criteria:

Check	What It Catches	Typical Threshold
Type Checking	Type mismatches, null errors	0 type errors
Static Analysis	Potential bugs, code smells	0 critical/high issues
Dead Code	Unreachable statements	0 dead code blocks
Null Safety	Potential null dereferences	0 null warnings

Gate Type 4: Security Scanning

Identifies vulnerabilities, secrets, and security anti-patterns.

// Security gate from methodologies/spec-driven-development.js
const security = await ctx.task(securityTask, {
  files: impl.filesModified,
  scanLevel: inputs.safetyLevel // 'standard' | 'high' | 'critical'
});

const securityGatePassed =
  security.criticalVulnerabilities === 0 &&
  security.highVulnerabilities === 0 &&
  security.secretsDetected === 0 &&
  security.dependencyVulnerabilities.critical === 0;

Gate Criteria:

Check	What It Scans	Typical Threshold
SAST (Static)	SQL injection, XSS, etc.	0 critical/high
Secrets Detection	API keys, passwords	0 secrets
Dependency Scan	Known CVEs in packages	0 critical CVEs
OWASP Top 10	Common web vulnerabilities	0 violations

Gate Type 5: Performance and Resource Thresholds

Ensures the implementation meets non-functional requirements.

// Performance gate for production readiness
const performance = await ctx.task(performanceCheckTask, {
  implementation: impl,
  thresholds: {
    loadTimeMs: 1500,      // First Contentful Paint
    bundleSizeKb: 200,     // Gzipped bundle
    apiResponseP95Ms: 500, // 95th percentile
    memoryUsageMb: 512     // Peak memory
  }
});

const performanceGatePassed =
  performance.fcp <= 1500 &&
  performance.bundleSize <= 200 &&
  performance.apiP95 <= 500 &&
  performance.peakMemory <= 512;

Gate Criteria:

Metric	Typical Target	Domain
FCP (First Contentful Paint)	< 1.5s	Frontend
Bundle Size	< 200KB gzipped	Frontend
API p95 Response	< 500ms	Backend
Memory Usage	< 512MB	Server
CPU Utilization	< 70% average	Server

The 90-Score Quality Convergence Pattern

To reliably achieve scores of 90+, implement a multi-gate weighted scoring system with iterative feedback.

Step 1: Define Weighted Scoring Dimensions

// Recommended weights for high-quality convergence
const QUALITY_WEIGHTS = {
  // For production features
  production: {
    tests: 0.25,           // Test coverage and pass rate
    implementation: 0.25,   // Code correctness
    codeQuality: 0.15,      // Lint, complexity, formatting
    security: 0.20,         // Vulnerability scanning
    performance: 0.15       // Non-functional requirements
  },

  // For security-critical systems
  securityCritical: {
    tests: 0.20,
    implementation: 0.20,
    codeQuality: 0.10,
    security: 0.35,         // Higher weight for security
    performance: 0.15
  },

  // For performance-critical systems
  performanceCritical: {
    tests: 0.20,
    implementation: 0.20,
    codeQuality: 0.10,
    security: 0.15,
    performance: 0.35       // Higher weight for performance
  }
};

Step 2: Implement the Multi-Gate Convergence Loop

/**
 * Multi-gate quality convergence targeting 90+ scores
 * References: gsd/iterative-convergence.js, methodologies/spec-driven-development.js
 */
export async function process(inputs, ctx) {
  const {
    feature,
    targetQuality = 90,      // Target score
    maxIterations = 10,      // Allow more iterations for high targets
    minImprovement = 2,      // Minimum improvement per iteration
    plateauThreshold = 3,    // Iterations without improvement
    weights = QUALITY_WEIGHTS.production
  } = inputs;

  let iteration = 0;
  let quality = 0;
  const iterationHistory = [];

  while (iteration < maxIterations && quality < targetQuality) {
    iteration++;
    ctx.log(`[Iteration ${iteration}/${maxIterations}] Target: ${targetQuality}`);

    // ===== ACT: Implement with feedback from previous iteration =====
    const previousFeedback = iteration > 1
      ? iterationHistory[iteration - 2].recommendations
      : null;

    const impl = await ctx.task(implementTask, {
      feature,
      iteration,
      previousFeedback,
      focusAreas: previousFeedback?.slice(0, 3) // Top 3 priorities
    });

    // ===== VALIDATE: Run all five quality gates in parallel =====
    const [tests, codeQuality, staticAnalysis, security, performance] =
      await ctx.parallel.all([
        () => ctx.task(testGateTask, { impl }),
        () => ctx.task(codeQualityGateTask, { impl }),
        () => ctx.task(staticAnalysisGateTask, { impl }),
        () => ctx.task(securityGateTask, { impl }),
        () => ctx.task(performanceGateTask, { impl })
      ]);

    // ===== SCORE: Calculate weighted quality score =====
    const scores = {
      tests: tests.score,
      implementation: calculateImplementationScore(impl, tests),
      codeQuality: codeQuality.score,
      security: security.score,
      performance: performance.score
    };

    quality = Object.entries(weights).reduce(
      (total, [dimension, weight]) => total + (scores[dimension] * weight),
      0
    );

    // ===== ANALYZE: Generate prioritized recommendations =====
    const recommendations = generateRecommendations(scores, weights, targetQuality);

    iterationHistory.push({
      iteration,
      quality,
      scores,
      recommendations,
      gates: { tests, codeQuality, staticAnalysis, security, performance }
    });

    ctx.log(`Quality: ${quality.toFixed(1)}/${targetQuality} | ` +
            `Tests: ${scores.tests} | Code: ${scores.codeQuality} | ` +
            `Security: ${scores.security} | Perf: ${scores.performance}`);

    // ===== EARLY EXIT: Detect plateau =====
    if (iteration >= plateauThreshold) {
      const recent = iterationHistory.slice(-plateauThreshold).map(r => r.quality);
      const improvement = Math.max(...recent) - Math.min(...recent);
      if (improvement < minImprovement) {
        ctx.log(`Quality plateaued at ${quality.toFixed(1)}, stopping early`);
        break;
      }
    }

    // ===== BREAKPOINT: At key thresholds =====
    const converged = quality >= targetQuality;
    if (!converged && quality >= 80 && iteration > 1) {
      await ctx.breakpoint({
        question: `Quality at ${quality.toFixed(1)}. Continue toward ${targetQuality}?`,
        title: `Iteration ${iteration} Checkpoint`,
        context: {
          runId: ctx.runId,
          files: [{ path: `artifacts/iteration-${iteration}-report.md`, format: 'markdown' }]
        }
      });
    }
  }

  // ===== FINAL VALIDATION =====
  const converged = quality >= targetQuality;

  return {
    success: converged,
    quality,
    targetQuality,
    iterations: iteration,
    iterationHistory,
    finalGates: iterationHistory[iterationHistory.length - 1].gates,
    metadata: { processId: 'quality-convergence-90', timestamp: ctx.now() }
  };
}

function generateRecommendations(scores, weights, target) {
  // Calculate gap for each dimension
  const gaps = Object.entries(scores).map(([dim, score]) => ({
    dimension: dim,
    score,
    weight: weights[dim],
    weightedGap: (100 - score) * weights[dim],
    priority: (100 - score) * weights[dim] // Higher weighted gap = higher priority
  }));

  // Sort by priority (highest impact improvements first)
  return gaps
    .sort((a, b) => b.priority - a.priority)
    .map(g => `Improve ${g.dimension}: currently ${g.score}, ` +
              `contributes ${(g.weight * g.score).toFixed(1)} of ${(g.weight * 100).toFixed(1)} possible`);
}

Step 3: Progressive Target Strategy

For challenging targets (90+), use progressive escalation:

// Progressive targets that increase as iterations proceed
const progressiveTargets = [
  { iteration: 1, target: 70 },   // First: basic functionality
  { iteration: 3, target: 80 },   // Mid: solid implementation
  { iteration: 5, target: 85 },   // Late: polish and edge cases
  { iteration: 7, target: 90 }    // Final: production ready
];

function getCurrentTarget(iteration, finalTarget) {
  const applicable = progressiveTargets.filter(t => t.iteration <= iteration);
  const progressiveTarget = applicable[applicable.length - 1]?.target || 70;
  return Math.min(progressiveTarget, finalTarget);
}

Real-World Process Examples

Example 1: V-Model with Four Test Levels

The V-Model process (methodologies/v-model.js) implements comprehensive quality gates:

/babysitter:call use the V-Model methodology to build a user authentication system with high safety level

Or with more detail:

/babysitter:call implement user authentication using V-Model with traceability and thorough testing

Quality Gates in V-Model:

Requirements → Acceptance Tests (validates user needs)
System Design → System Tests (validates architecture)
Module Design → Integration Tests (validates interfaces)
Implementation → Unit Tests (validates code)
Traceability Matrix (validates coverage)

Example 2: Spec-Kit with Constitution Validation

The Spec-Kit process (methodologies/spec-driven-development.js) adds governance gates:

/babysitter:call use spec-driven development to build PCI-compliant payment processing

Or:

/babysitter:call build a payment flow using the spec-driven methodology with governance validation

Quality Gates in Spec-Kit:

Constitution Validation (governance principles)
Specification Review (requirements completeness)
Plan-Constitution Alignment (architecture compliance)
Task Consistency Analysis (cross-artifact validation)
Implementation Checklists ("unit tests for English")
User Story Validation (final acceptance)

Example 3: GSD Iterative Convergence

The GSD process (gsd/iterative-convergence.js) implements feedback-driven convergence:

/babysitter:call build a shopping cart checkout flow with 90% quality target

Or:

/babysitter:call implement checkout flow using iterative convergence with max 8 iterations

Quality Gates in GSD:

Implementation scoring
Test execution
Quality assessment with recommendations
Iterative feedback loop

Use Cases and Scenarios

Scenario 1: TDD Feature Development

Build a feature with test-driven development, iterating until test coverage and quality targets are met.

export async function process(inputs, ctx) {
  const { feature, targetQuality = 85, maxIterations = 5 } = inputs;

  let iteration = 0;
  let quality = 0;

  while (iteration < maxIterations && quality < targetQuality) {
    iteration++;
    ctx.log(`[Iteration ${iteration}/${maxIterations}] Starting TDD implementation...`);

    // Write tests first
    const tests = await ctx.task(writeTestsTask, { feature, iteration });

    // Implement code to pass tests
    const impl = await ctx.task(implementTask, { tests, feature });

    // Run quality checks
    const [coverage, lint, security] = await ctx.parallel.all([
      () => ctx.task(coverageTask, {}),
      () => ctx.task(lintTask, {}),
      () => ctx.task(securityTask, {})
    ]);

    // Agent scores quality
    const score = await ctx.task(agentScoringTask, {
      tests, impl, coverage, lint, security
    });

    quality = score.overall;
    ctx.log(`Quality score: ${quality}/${targetQuality}`);
  }

  return { converged: quality >= targetQuality, iterations: iteration, quality };
}

Scenario 2: Code Quality Improvement

Iteratively improve existing code until it meets quality standards.

export async function process(inputs, ctx) {
  const { files, targetScore = 90, maxIterations = 10 } = inputs;

  let iteration = 0;
  let currentScore = 0;

  // Initial assessment
  currentScore = await ctx.task(assessQualityTask, { files });
  ctx.log(`Initial quality score: ${currentScore}`);

  while (iteration < maxIterations && currentScore < targetScore) {
    iteration++;

    // Identify improvements
    const improvements = await ctx.task(identifyImprovementsTask, {
      files,
      currentScore,
      targetScore
    });

    // Apply improvements
    await ctx.task(applyImprovementsTask, { improvements });

    // Re-assess
    currentScore = await ctx.task(assessQualityTask, { files });
    ctx.log(`Iteration ${iteration}: Quality score ${currentScore}/${targetScore}`);
  }

  return { achieved: currentScore >= targetScore, finalScore: currentScore };
}

Scenario 3: Documentation Generation

Generate documentation and refine until it meets completeness standards.

export async function process(inputs, ctx) {
  const { codebase, targetCompleteness = 80, maxIterations = 3 } = inputs;

  let iteration = 0;
  let completeness = 0;

  while (iteration < maxIterations && completeness < targetCompleteness) {
    iteration++;

    // Generate or improve documentation
    await ctx.task(generateDocsTask, { codebase, iteration });

    // Assess completeness
    const assessment = await ctx.task(assessDocsCompletenessTask, { codebase });
    completeness = assessment.completenessScore;

    ctx.log(`Documentation completeness: ${completeness}%`);
  }

  return { complete: completeness >= targetCompleteness, completeness };
}

Step-by-Step Instructions

Step 1: Define Quality Targets

Determine what quality means for your use case.

Common quality metrics:

Test coverage percentage (e.g., 85%)
Lint error count (e.g., 0 errors)
Security vulnerability count (e.g., 0 critical)
Overall quality score (e.g., 90/100)

Step 2: Set Iteration Limits

Prevent infinite loops by setting a maximum number of iterations.

const { targetQuality = 85, maxIterations = 5 } = inputs;

Recommendations:

Simple improvements: 3-5 iterations
Complex refactoring: 5-10 iterations
Large features: 10-15 iterations

Step 3: Implement the Convergence Loop

Create a loop that continues until the target is met or iterations are exhausted.

let iteration = 0;
let quality = 0;

while (iteration < maxIterations && quality < targetQuality) {
  iteration++;

  // Perform work
  // ...

  // Measure quality
  quality = await measureQuality();

  ctx.log(`Iteration ${iteration}: ${quality}/${targetQuality}`);
}

Step 4: Implement Quality Scoring

Create a task that evaluates quality based on your criteria.

export const agentQualityScoringTask = defineTask('quality-scorer', (args, taskCtx) => ({
  kind: 'agent',
  title: 'Score implementation quality',
  agent: {
    name: 'quality-assessor',
    prompt: {
      role: 'senior quality assurance engineer',
      task: 'Analyze implementation quality and provide a score from 0-100',
      context: {
        tests: args.tests,
        implementation: args.implementation,
        coverage: args.coverage,
        lint: args.lint,
        security: args.security
      },
      instructions: [
        'Review test quality (weight: 25%)',
        'Review implementation quality (weight: 30%)',
        'Review code metrics (weight: 20%)',
        'Review security (weight: 15%)',
        'Review alignment with requirements (weight: 10%)',
        'Provide recommendations for improvement'
      ]
    }
  }
}));

Step 5: Add Feedback to Subsequent Iterations

Pass quality feedback to the next iteration to guide improvements.

const iterationResults = [];

while (iteration < maxIterations && quality < targetQuality) {
  iteration++;

  const previousFeedback = iteration > 1
    ? iterationResults[iteration - 2].recommendations
    : null;

  const impl = await ctx.task(implementTask, {
    feature,
    previousFeedback  // Guide improvements based on previous scoring
  });

  const score = await ctx.task(agentScoringTask, { impl });

  iterationResults.push({
    iteration,
    quality: score.overall,
    recommendations: score.recommendations
  });

  quality = score.overall;
}

Configuration Options

Quality Target Configuration

Parameter	Type	Default	Description
`targetQuality`	number	85	Target quality score (0-100)
`maxIterations`	number	5	Maximum number of iterations before stopping

Scoring Weights Configuration

Customize how different aspects contribute to the overall score.

const scoringWeights = {
  tests: 0.25,          // 25% weight for test quality
  implementation: 0.30,  // 30% weight for implementation quality
  codeQuality: 0.20,     // 20% weight for code metrics
  security: 0.15,        // 15% weight for security
  alignment: 0.10        // 10% weight for requirements alignment
};

Early Exit Conditions

Configure conditions that stop iteration early.

// Stop if quality plateaus (no improvement in last N iterations)
if (qualityHistory.length >= 3) {
  const lastThree = qualityHistory.slice(-3);
  const improvement = lastThree[2] - lastThree[0];
  if (improvement < 1) {
    ctx.log('Quality plateaued, stopping early');
    break;
  }
}

Code Examples and Best Practices

Example 1: Full TDD Quality Convergence Process

Complete process definition demonstrating all quality convergence patterns.

export async function process(inputs, ctx) {
  const {
    feature = 'User authentication',
    targetQuality = 85,
    maxIterations = 5
  } = inputs;

  // Phase 1: Planning
  const plan = await ctx.task(agentPlanningTask, { feature });

  await ctx.breakpoint({
    question: `Review the plan for "${feature}". Approve to proceed?`,
    title: 'Plan Review',
    context: { runId: ctx.runId, files: [{ path: 'artifacts/plan.md', format: 'markdown' }] }
  });

  // Phase 2: Quality Convergence Loop
  let iteration = 0;
  let quality = 0;
  const iterationResults = [];

  while (iteration < maxIterations && quality < targetQuality) {
    iteration++;
    ctx.log(`[Iteration ${iteration}/${maxIterations}]`);

    // TDD: Write tests first
    const tests = await ctx.task(writeTestsTask, {
      feature,
      plan,
      iteration,
      previousFeedback: iteration > 1 ? iterationResults[iteration - 2].feedback : null
    });

    // Run tests (expect failures on first iteration)
    await ctx.task(runTestsTask, { testFiles: tests.testFiles, expectFailures: iteration === 1 });

    // Implement to pass tests
    const impl = await ctx.task(implementTask, {
      feature,
      tests,
      iteration,
      previousFeedback: iteration > 1 ? iterationResults[iteration - 2].feedback : null
    });

    // Run tests again
    const testResults = await ctx.task(runTestsTask, { testFiles: tests.testFiles });

    // Parallel quality checks
    const [coverage, lint, typeCheck, security] = await ctx.parallel.all([
      () => ctx.task(coverageTask, {}),
      () => ctx.task(lintTask, { files: impl.filesModified }),
      () => ctx.task(typeCheckTask, { files: impl.filesModified }),
      () => ctx.task(securityTask, { files: impl.filesModified })
    ]);

    // Agent quality scoring
    const score = await ctx.task(agentQualityScoringTask, {
      tests,
      testResults,
      implementation: impl,
      qualityChecks: { coverage, lint, typeCheck, security },
      iteration,
      targetQuality
    });

    quality = score.overallScore;
    iterationResults.push({
      iteration,
      quality,
      feedback: score.recommendations
    });

    ctx.log(`Quality: ${quality}/${targetQuality}`);

    if (quality >= targetQuality) {
      ctx.log('Target quality achieved!');
    }
  }

  // Final approval
  await ctx.breakpoint({
    question: `Quality: ${quality}/${targetQuality}. Approve for merge?`,
    title: 'Final Review',
    context: { runId: ctx.runId, files: [{ path: 'artifacts/final-report.md', format: 'markdown' }] }
  });

  return {
    success: quality >= targetQuality,
    iterations: iteration,
    finalQuality: quality,
    iterationResults
  };
}

Example 2: Quality Scoring Task Definition

export const agentQualityScoringTask = defineTask('quality-scorer', (args, taskCtx) => ({
  kind: 'agent',
  title: `Score quality (iteration ${args.iteration})`,
  description: 'Comprehensive quality assessment with agent',

  agent: {
    name: 'quality-assessor',
    prompt: {
      role: 'senior quality assurance engineer and code reviewer',
      task: 'Analyze implementation quality across multiple dimensions',
      context: {
        feature: args.feature,
        tests: args.tests,
        testResults: args.testResults,
        implementation: args.implementation,
        qualityChecks: args.qualityChecks,
        iteration: args.iteration,
        targetQuality: args.targetQuality
      },
      instructions: [
        'Review test quality: coverage, edge cases, assertions (weight: 25%)',
        'Review implementation quality: correctness, readability (weight: 30%)',
        'Review code metrics: lint, types, complexity (weight: 20%)',
        'Review security: vulnerabilities, input validation (weight: 15%)',
        'Review requirements alignment (weight: 10%)',
        'Calculate weighted overall score (0-100)',
        'Provide prioritized recommendations for improvement'
      ],
      outputFormat: 'JSON with overallScore, scores by dimension, recommendations'
    },
    outputSchema: {
      type: 'object',
      required: ['overallScore', 'scores', 'recommendations'],
      properties: {
        overallScore: { type: 'number', minimum: 0, maximum: 100 },
        scores: {
          type: 'object',
          properties: {
            tests: { type: 'number' },
            implementation: { type: 'number' },
            codeQuality: { type: 'number' },
            security: { type: 'number' },
            alignment: { type: 'number' }
          }
        },
        recommendations: { type: 'array', items: { type: 'string' } }
      }
    }
  },

  io: {
    inputJsonPath: `tasks/${taskCtx.effectId}/input.json`,
    outputJsonPath: `tasks/${taskCtx.effectId}/result.json`
  }
}));

Best Practices

Set Realistic Targets: Aim for achievable quality scores (80-90% is often reasonable)
Limit Iterations: Prevent runaway loops with sensible limits (5-10 iterations typically)
Use Parallel Checks: Run independent quality checks concurrently for efficiency
Provide Feedback: Pass recommendations from scoring to subsequent iterations
Log Progress: Track quality scores across iterations for visibility
Include Breakpoints: Add approval gates at key milestones

Common Pitfalls and Troubleshooting

Pitfall 1: Quality Score Not Improving

Symptom:

Iteration 1: Quality 65/100
Iteration 2: Quality 66/100
Iteration 3: Quality 65/100
Iteration 4: Quality 67/100
Iteration 5: Quality 66/100
Target not met: 85/100

Causes:

Quality target is unrealistic for the codebase
Scoring criteria are too strict
Fundamental issues blocking improvement

Solutions:

Review iteration feedback to identify blocking issues:
```
What recommendations came from my quality scoring?
```

Adjust quality target:

const { targetQuality = 75 } = inputs;  // Lower target

Increase iteration limit:

const { maxIterations = 10 } = inputs;  // More iterations

Review scoring weights for balance

Pitfall 2: Too Many Iterations

Symptom: Process runs for many iterations before converging.

Cause: Target is too high or improvements are too granular.

Solutions:

Implement early exit on plateau:

const recentScores = iterationResults.slice(-3).map(r => r.quality);
if (Math.max(...recentScores) - Math.min(...recentScores) < 2) {
  ctx.log('Quality plateaued, stopping early');
  break;
}

Increase improvement scope per iteration
Lower quality target to realistic level

Pitfall 3: Inconsistent Quality Scores

Symptom: Quality scores vary significantly between iterations without clear reason.

Cause: Non-deterministic scoring or external factors.

Solution:

Use deterministic scoring criteria
Ensure ctx.now() is used instead of Date.now() for timestamps
Review agent scoring prompts for consistency

Pitfall 4: Iteration Takes Too Long

Symptom: Each iteration takes several minutes.

Cause: Sequential execution of independent tasks.

Solution: Use parallel execution:

// Slow: Sequential
const coverage = await ctx.task(coverageTask, {});
const lint = await ctx.task(lintTask, {});
const security = await ctx.task(securityTask, {});

// Fast: Parallel
const [coverage, lint, security] = await ctx.parallel.all([
  () => ctx.task(coverageTask, {}),
  () => ctx.task(lintTask, {}),
  () => ctx.task(securityTask, {})
]);

Process Definitions - Learn to create quality convergence processes
Parallel Execution - Optimize quality checks with parallelism
Breakpoints - Add approval gates to quality convergence workflows
Best Practices - Patterns for setting targets, custom scoring strategies, and balancing speed vs thoroughness
Process Library - Browse the SDK-managed library and current process counts
Two-Loops Architecture - Deep dive into the evidence-driven completion model

Try Different Methodologies and Processes

Babysitter offers two levels of reusable workflows:

Methodologies (38 directories in this repo snapshot) - The "How"

Quality convergence works with ANY of Babysitter's methodology families - not just TDD. In this repository snapshot there are 38 methodology directories under library/methodologies/.

Methodology	Best For	Quality Focus
TDD Quality Convergence	Test-first development	Test coverage, regression prevention
GSD (Get Stuff Done)	Rapid prototyping	Working software, iteration speed
Spec-Kit	Enterprise/governance	Specification compliance, audit trails
BDD/Specification by Example	Team collaboration	Acceptance criteria, living documentation
Domain-Driven Design	Complex business domains	Domain model integrity, bounded contexts

Browse methodologies:

Domain Processes - The "What"

Beyond methodologies, Babysitter includes the following generated specialization snapshot from the live repository tree:

Domain	Processes	Examples
Development and technical specializations	837	Web APIs, mobile apps, DevOps pipelines, AI, security, and related technical workflows
Business domains	490	Legal contracts, HR workflows, marketing campaigns, finance, logistics, and related domains
Science & engineering domains	551	Quantum algorithms, aerospace systems, biomedical devices, mathematics, and related domains
Social sciences & humanities	160	Education, healthcare, arts, philosophy, and social-science research

Browse processes:

Process Library - Full catalog with descriptions
Specializations folder

What To Do Next

Your Goal	Next Step
Run a quality convergence workflow	Try `/babysitter:call build a feature with 85% quality target`
Build your own convergence loop	Copy the TDD example above and customize the scoring
Add more quality gates	See the Five Quality Gate Categories section
Debug a stuck convergence	Check Best Practices - Debugging
Understand the architecture	Read Two-Loops Architecture

Summary

Quality convergence enables automated iterative improvement until defined quality targets are met. Combine quality scoring, feedback loops, and sensible iteration limits to ensure consistent, high-quality outputs. Use parallel execution for efficiency and breakpoints for human oversight at critical milestones.

Key Takeaways:

Set realistic targets - Start with 80-85, work up to 90+
Use multiple gate types - Tests + lint + security + performance
Pass feedback between iterations - AI learns from each failure
Detect plateaus early - Don't waste iterations on no improvement
Parallelize independent checks - Faster iterations mean faster convergence

Quick Summary (Read This First)​

What You'll Learn in This Document​

A Simple Example​

Understanding Quality Scores​

The Power of Custom Dimensions​

Overview​

The Core Principle: Evidence-Driven Completion​

Why Use Quality Convergence​

The Five Quality Gate Categories​

Gate Type 1: Functional Tests (Unit/Integration/System/Acceptance)​

Gate Type 2: Code Quality (Lint/Format/Complexity)​

Gate Type 3: Type Safety and Static Analysis​

Gate Type 4: Security Scanning​

Gate Type 5: Performance and Resource Thresholds​

The 90-Score Quality Convergence Pattern​

Step 1: Define Weighted Scoring Dimensions​

Step 2: Implement the Multi-Gate Convergence Loop​

Step 3: Progressive Target Strategy​

Real-World Process Examples​

Example 1: V-Model with Four Test Levels​

Example 2: Spec-Kit with Constitution Validation​

Example 3: GSD Iterative Convergence​

Use Cases and Scenarios​

Scenario 1: TDD Feature Development​

Scenario 2: Code Quality Improvement​

Scenario 3: Documentation Generation​

Step-by-Step Instructions​

Step 1: Define Quality Targets​

Step 2: Set Iteration Limits​

Step 3: Implement the Convergence Loop​

Step 4: Implement Quality Scoring​

Step 5: Add Feedback to Subsequent Iterations​

Configuration Options​

Quality Target Configuration​

Scoring Weights Configuration​

Early Exit Conditions​

Code Examples and Best Practices​

Example 1: Full TDD Quality Convergence Process​

Example 2: Quality Scoring Task Definition​

Best Practices​

Common Pitfalls and Troubleshooting​

Pitfall 1: Quality Score Not Improving​

Pitfall 2: Too Many Iterations​

Pitfall 3: Inconsistent Quality Scores​

Pitfall 4: Iteration Takes Too Long​

Related Documentation​

Try Different Methodologies and Processes​

Methodologies (38 directories in this repo snapshot) - The "How"​

Domain Processes - The "What"​

What To Do Next​

Summary​

Quick Summary (Read This First)

What You'll Learn in This Document

A Simple Example

Understanding Quality Scores

The Power of Custom Dimensions

Overview

The Core Principle: Evidence-Driven Completion

Why Use Quality Convergence

The Five Quality Gate Categories

Gate Type 1: Functional Tests (Unit/Integration/System/Acceptance)

Gate Type 2: Code Quality (Lint/Format/Complexity)

Gate Type 3: Type Safety and Static Analysis

Gate Type 4: Security Scanning

Gate Type 5: Performance and Resource Thresholds

The 90-Score Quality Convergence Pattern

Step 1: Define Weighted Scoring Dimensions

Step 2: Implement the Multi-Gate Convergence Loop

Step 3: Progressive Target Strategy

Real-World Process Examples

Example 1: V-Model with Four Test Levels

Example 2: Spec-Kit with Constitution Validation

Example 3: GSD Iterative Convergence

Use Cases and Scenarios

Scenario 1: TDD Feature Development

Scenario 2: Code Quality Improvement

Scenario 3: Documentation Generation

Step-by-Step Instructions

Step 1: Define Quality Targets

Step 2: Set Iteration Limits

Step 3: Implement the Convergence Loop

Step 4: Implement Quality Scoring

Step 5: Add Feedback to Subsequent Iterations

Configuration Options

Quality Target Configuration

Scoring Weights Configuration

Early Exit Conditions

Code Examples and Best Practices

Example 1: Full TDD Quality Convergence Process

Example 2: Quality Scoring Task Definition

Best Practices

Common Pitfalls and Troubleshooting

Pitfall 1: Quality Score Not Improving

Pitfall 2: Too Many Iterations

Pitfall 3: Inconsistent Quality Scores

Pitfall 4: Iteration Takes Too Long

Related Documentation

Try Different Methodologies and Processes

Methodologies (38 directories in this repo snapshot) - The "How"

Domain Processes - The "What"

What To Do Next

Summary