Process Lifecycle, Safety, and Cross-Platform Support
Specification v1.0 | @a5c-ai/agent-mux
SCOPE EXTENSION: hermes-agent (
@NousResearch/hermes-agent) is included as a 10th supported agent per explicit project requirements from the project owner. It extends the original scope document's 9 built-in agents. All hermes-specific content in this spec is marked with this same scope extension note.
1. Overview
This specification is the authoritative reference for subprocess management, process safety guarantees, and cross-platform support in @a5c-ai/agent-mux. It consolidates and deepens the process-lifecycle material introduced in 03-run-handle-and-interaction.md (sections 6–12), adds the full per-agent cross-platform compatibility matrix from scope §23, and specifies platform-specific path resolution, shell invocation, PTY backend selection, and resource cleanup in detail.
All ten built-in agents (claude, codex, gemini, copilot, cursor, opencode, pi, omp, openclaw, hermes) share the same process lifecycle contract. Differences in platform support, PTY requirements, and shell invocation are documented per-agent in the tables below.
1.1 Cross-References
| Type / Concept | Spec | Section |
|---|---|---|
RunHandle, subprocess management | 03-run-handle-and-interaction.md | 6 |
ProcessTracker, zombie prevention | 03-run-handle-and-interaction.md | 6.4 |
PlatformAdapter interface (base) | 03-run-handle-and-interaction.md | 8.3 |
PTY support, node-pty dependency | 03-run-handle-and-interaction.md | 7 |
| Run isolation, temp directories | 03-run-handle-and-interaction.md | 9 |
| Backpressure and buffer management | 03-run-handle-and-interaction.md | 10 |
| Concurrency safety | 03-run-handle-and-interaction.md | 11 |
RunOptions.gracePeriodMs | 03-run-handle-and-interaction.md | 6.2 (within signal handling prose) |
SpawnArgs type | 05-adapter-system.md | 3.1 |
AgentAdapter.buildSpawnArgs() | 05-adapter-system.md | 2 |
AgentCapabilities.supportedPlatforms | 06-capabilities-and-models.md | 2 |
AgentCapabilities.requiresPty | 06-capabilities-and-models.md | 2 |
ConfigManager file locking | 08-config-and-auth.md | 13 |
| Native config file locations | 08-config-and-auth.md | 7 |
ErrorCode union | 01-core-types-and-client.md | 3.1 |
AgentMuxError | 01-core-types-and-client.md | 3.1 |
| CLI signal handling | 10-cli-reference.md | 20 |
RunOptions | 02-run-options-and-profiles.md | 2 |
2. Subprocess Spawn Sequence
When mux.run() is called, the stream engine executes the following spawn sequence. Each step is numbered for reference in error-handling sections. This sequence is a simplified summary; the authoritative step-by-step is in 03-run-handle-and-interaction.md §6.1. The ordering below groups steps by concern for readability — the critical constraint is that Step 5 (ProcessTracker registration) must happen synchronously after spawn and before any await.
Step 1 Validate RunOptions against agent capabilities
→ CapabilityError on unsupported options
Step 2 Create per-run temp directory
→ os.tmpdir()/agent-mux-<runId>/
→ Mode 0o700 (owner read/write/execute only)
Step 3 Call adapter.buildSpawnArgs(resolvedOptions)
→ Produces SpawnArgs { command, args, env, cwd, shell, usePty }
Step 4 Determine spawn mode (pipe vs. PTY)
→ If usePty && !nodePtyAvailable → throw PTY_NOT_AVAILABLE
→ If usePty → pty.spawn()
→ Else → child_process.spawn()
Step 5 Register subprocess with ProcessTracker
→ Must happen synchronously after spawn, before any await
Step 6 Wire stdio pipes / PTY streams to line parser
→ Line parser feeds adapter.parseEvent()
→ Parsed events enter the event buffer
Step 7 Start timeout / inactivity timers
→ Per RunOptions.timeout and RunOptions.inactivityTimeout
Step 8 Emit 'session_start' or 'session_resume' event
→ Run is now in 'running' state
2.1 Spawn Options by Mode
Pipe Mode (default)
import { spawn } from 'child_process';
const child = spawn(spawnArgs.command, spawnArgs.args, {
cwd: spawnArgs.cwd,
env: { ...process.env, ...spawnArgs.env },
stdio: ['pipe', 'pipe', 'pipe'],
detached: process.platform !== 'win32', // Unix: new process group
shell: spawnArgs.shell,
windowsHide: true,
});
Unix: detached: true creates a new process group. The process group ID equals the child PID. Signals sent to -pid reach the entire group.
Windows: detached: false (the child shares the parent's console). The child is assigned to a Job Object for lifecycle management (see Section 3.3).
PTY Mode
import * as pty from 'node-pty';
const child = pty.spawn(spawnArgs.command, spawnArgs.args, {
name: 'xterm-256color',
cols: 120,
rows: 40,
cwd: spawnArgs.cwd,
env: { ...process.env, ...spawnArgs.env },
});
PTY mode is used only when spawnArgs.usePty is true (see Section 6 for which agents require it).
3. Process Tracking and Zombie Prevention
3.1 ProcessTracker Singleton
The ProcessTracker is a module-level singleton that maintains the set of all active subprocesses across all RunHandle instances. Its interface is defined in 03-run-handle-and-interaction.md §6.4; this section specifies platform-specific implementation details.
interface ProcessTracker {
/**
* Register a spawned process for tracking.
*
* @param pid - Process ID of the spawned child.
* @param groupId - Process group ID (Unix) or Job Object handle ID (Windows).
* @param runId - The run ID that owns this process.
* @param gracePeriodMs - Grace period for this process's two-phase shutdown.
* Stored per-registration so killAll() uses the correct grace period for
* each tracked process. Defaults to 5000ms if not provided.
*/
register(pid: number, groupId: number, runId: string, gracePeriodMs?: number): void;
unregister(pid: number): void;
/**
* Kill all tracked processes using the two-phase shutdown sequence.
* Each process uses the gracePeriodMs stored at registration time.
* See behavioral contract below.
*/
killAll(): void;
readonly activeCount: number;
}
Note on interface divergence: The
ProcessTrackerinterface in03-run-handle-and-interaction.md§6.4 definesregister(pid, groupId, runId)with 3 parameters. This spec extends it with an optional 4th parametergracePeriodMs. Implementors must provide the 4-parameter signature. The authoritative complete interface is in §19 (Complete Type Reference) of this spec.
killAll() behavioral contract (implements scope §22: "On SIGTERM: SIGINT first, SIGKILL after grace period"):
The grace period for each tracked process is stored at register() time, sourced from the run's resolved RunOptions.gracePeriodMs (see 03-run-handle-and-interaction.md §6.2). This allows killAll() to use per-run grace periods without accepting parameters — important because killAll() is called from process.on('exit') and signal handlers where argument passing is impractical.
When called from an async-capable context (e.g., process.on('SIGTERM'), process.on('SIGINT')):
- Send SIGINT (Unix) or
CTRL_C_EVENT(Windows) to each tracked process group. - Wait up to each process's registered grace period (default: 5000ms).
- Send SIGKILL (Unix) or
TerminateProcess(Windows) to any process groups that have not exited. - On Windows, additionally close each Job Object handle (defense-in-depth).
When called from a synchronous-only context (process.on('exit')):
- Send SIGKILL (Unix) or close Job Object handles (Windows) immediately — the grace period cannot be honored because the event loop is shutting down.
3.2 Unix Process Group Management
On Unix (macOS and Linux), each subprocess is spawned with detached: true, creating a new process group:
- Process group ID equals the child PID (standard POSIX behavior for
setpgid(0, 0)). - Signal delivery uses
process.kill(-pid, signal)— the negated PID targets the entire process group, including any child-of-child processes (language servers, build tools, shell scripts). - Zombie reaping is handled by Node.js's internal libuv loop, which calls
waitpid()for each child. The'exit'event on theChildProcesstriggersProcessTracker.unregister().
3.3 Windows Job Object Management
On Windows, each subprocess is assigned to a Job Object immediately after spawn:
- Created with
JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE— when the job handle is closed (including on abrupt Node.js exit), the OS terminates all processes in the job. - This provides defense-in-depth: even if
process.on('exit')handlers do not execute (e.g.,TerminateProcessis called on the Node.js process itself), orphaned agent subprocesses are still cleaned up. - The Job Object handle is stored in the
ProcessTrackeralongside the PID and run ID. killAll()on Windows closes all stored job handles, triggering OS-level cleanup.
3.4 Node.js Exit Handlers
The ProcessTracker installs handlers on the following Node.js events (installed once, on first register() call):
| Event | Action |
|---|---|
process.on('exit') | Synchronous killAll(). Cannot start async work. |
process.on('SIGTERM') | killAll(), then process.exit(1). |
process.on('SIGINT') | killAll(), then process.exit(1). |
process.on('uncaughtException') | killAll(), then rethrow. |
process.on('unhandledRejection') | killAll(), then rethrow. |
Invariant: killAll() must be unconditionally safe (never throws). If an individual process kill fails (e.g., process already exited, permission denied), the error is silently ignored and the tracker continues to the next process.
3.5 Orphan Scenarios
| Scenario | Unix | Windows |
|---|---|---|
| Normal Node.js exit | process.on('exit') → killAll() | Job Object auto-kill |
| SIGTERM to Node.js | Handler runs killAll() | Node.js emulates SIGTERM on Windows; handler runs killAll() |
| SIGKILL to Node.js | Orphans survive. Re-parented to PID 1. Cleanup: kill -9 -<pgid> | Job Object auto-kill (OS-level) |
| Node.js crash (segfault) | Depends on signal handler; likely orphans | Job Object auto-kill |
process.exit(0) from code | process.on('exit') runs | process.on('exit') runs + Job Object |
4. Signal Handling
4.1 Two-Phase Shutdown (abort)
When RunHandle.abort() is called:
t=0ms Send graceful signal
├── Unix: SIGTERM to process group (kill(-pid, SIGTERM))
└── Windows: GenerateConsoleCtrlEvent(CTRL_BREAK_EVENT)
Start grace period timer
t=0..G Monitor for process exit
If process exits → cleanup, resolve RunResult
t=G ms Grace period expired, process still alive
├── Unix: SIGKILL to process group (kill(-pid, SIGKILL))
└── Windows: TerminateProcess(handle, 1)
t=G+100ms Final check — process guaranteed dead
Cleanup temp dir, resolve RunResult
Default grace period: 5000ms (scope §22).
Per-run override: RunOptions.gracePeriodMs (spec-level extension defined in 03-run-handle-and-interaction.md §6.2). Also configurable at the global config level via gracePeriodMs.
Signal choice rationale (abort vs. killAll): abort() sends SIGTERM (a graceful termination request), because the consumer is explicitly ending a single run and the agent should have a chance to clean up. killAll() sends SIGINT (the interrupt signal), because it implements scope §22's requirement ("On SIGTERM: SIGINT first, SIGKILL after grace period") — when the Node.js process itself receives SIGTERM (or SIGINT, or encounters a fatal error), it forwards SIGINT to child processes as the first phase of shutdown. The choice of SIGINT (not SIGTERM) for the forwarded signal intentionally differentiates the signal received by children from the signal received by the parent, making it possible for agents that trap both signals to distinguish between "the mux process is shutting down" (SIGINT) and "this specific run is being aborted" (SIGTERM).
4.2 Interrupt (SIGINT)
RunHandle.interrupt() sends a soft interrupt, allowing the agent to finish its current tool call:
| Platform | Pipe mode | PTY mode |
|---|---|---|
| Unix | process.kill(-pid, 'SIGINT') | Write \x03 (Ctrl+C) to PTY input |
| Windows | GenerateConsoleCtrlEvent(CTRL_C_EVENT, pid) | Write \x03 to PTY input |
Windows caveat: GenerateConsoleCtrlEvent requires the subprocess to share a console with the parent. For console-detached processes, the signal delivery may silently fail. All 10 built-in agents are spawned with windowsHide: true (console shared), so this is not an issue for built-in agents.
4.3 Pause / Resume
| Platform | Pause | Resume |
|---|---|---|
| Unix | process.kill(pid, 'SIGTSTP') | process.kill(pid, 'SIGCONT') |
| Windows | SuspendThread() on all process threads | ResumeThread() on all process threads |
Windows caveat: Thread enumeration for pause/resume uses NtQuerySystemInformation or CreateToolhelp32Snapshot. Race conditions exist if the process creates new threads between enumeration and suspension. This is a known limitation; in practice, agent CLI processes rarely create threads during operation.
4.4 Signal Summary Table
| Operation | Unix Signal | Windows Equivalent | PTY Override |
|---|---|---|---|
| Interrupt | SIGINT | CTRL_C_EVENT | \x03 to PTY stdin |
| Graceful terminate | SIGTERM | CTRL_BREAK_EVENT | \x03 then close PTY |
| Force kill | SIGKILL | TerminateProcess | Close PTY handle |
| Pause | SIGTSTP | SuspendThread | \x1a to PTY stdin |
| Resume | SIGCONT | ResumeThread | (automatic on data write) |
5. Cross-Platform Support Matrix
5.1 Per-Agent Platform Support
From scope §23, extended with hermes-agent:
| Agent | macOS | Linux | Windows | Notes |
|---|---|---|---|---|
| claude | ✅ | ✅ | ✅ | |
| codex | ✅ | ✅ | ✅ | |
| gemini | ✅ | ✅ | ✅ | |
| copilot | ✅ | ✅ | ✅ | |
| cursor | ✅ | ✅ | ✅ | |
| opencode | ✅ | ✅ | ✅ | |
| pi |