Multi-OS Testing
Coming Soon — Multi-OS testing depends on fleet orchestration, which is under active development and not yet functional end-to-end. The APIs, CLI flags, and configuration formats described here reflect the planned design and will change before release.
Validate builds and tests across multiple operating systems in parallel using Agent Harbor’s leader/follower architecture.
Overview
Multi-OS testing enables agents to run tests on Linux, macOS, and Windows simultaneously:
- Leader: Primary executor (typically Linux) that owns filesystem snapshots and initiates sync
- Followers: Secondary executors that mirror the leader via high-speed file sync
- Sync Fence: Ensures all followers have identical file state before test execution
- run-everywhere: Execute commands across all platforms and aggregate results
Architecture
┌─────────────────────────────────────────────────────────────┐
│ Agent works on Leader (Linux) │
│ - Edits files │
│ - Creates filesystem snapshots │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Sync Fence │
│ - Snapshot captured │
│ - Changes synced to all followers via Mutagen │
└─────────────────────────────────────────────────────────────┘
│
┌───────────────┼───────────────┐
▼ ▼ ▼
┌───────────────────┐ ┌───────────────┐ ┌───────────────────┐
│ Linux Follower │ │ macOS Follower│ │ Windows Follower │
│ pytest -q │ │ pytest -q │ │ pytest -q │
└───────────────────┘ └───────────────┘ └───────────────────┘
│ │ │
└───────────────┼───────────────┘
▼
┌─────────────────────────────────────────────────────────────┐
│ Results aggregated and returned to agent │
└─────────────────────────────────────────────────────────────┘Fleet Configuration
Define your fleet of executors in the configuration:
# .agents/config.toml
[fleet]
name = "cross-platform"
[[fleet.members]]
name = "linux-leader"
role = "leader"
host = "linux.internal"
ssh_user = "agent"
[[fleet.members]]
name = "macos-dev"
role = "follower"
host = "mac.internal"
ssh_user = "agent"
tags = { os = "macos", arch = "arm64" }
[[fleet.members]]
name = "windows-build"
role = "follower"
host = "win.internal"
ssh_user = "agent"
tags = { os = "windows", arch = "x64" }
[[fleet.members]]
name = "linux-arm"
role = "follower"
host = "arm.internal"
ssh_user = "agent"
tags = { os = "linux", arch = "arm64" }Running Cross-Platform Tasks
Basic Usage
ah task create --agent claude --fleet cross-platform --prompt "Add feature X and ensure tests pass on all platforms"Manual Cross-Platform Commands
Run commands on specific platforms from the leader:
# Run on all followers
ah agent followers run -- npm test
# Run only on Windows
ah agent followers run --tag os=windows -- npm test
# Run only on ARM platforms
ah agent followers run --tag arch=arm64 -- cargo test
# Run on a specific host
ah agent followers run --host macos-dev -- swift testExecution Workflow
1. Agent Edits Files (Leader)
The agent makes changes on the leader workspace.
2. Sync Fence
Before running cross-platform tests, the agent calls:
ah agent followers sync-fenceThis:
- Creates a filesystem snapshot on the leader
- Syncs all changes to followers via Mutagen
- Waits until all followers confirm sync completion
3. Run Tests Everywhere
ah agent followers run -- pytest -qThis:
- Executes the command on all followers in parallel
- Adapts the command for each OS (shell, paths, quoting)
- Aggregates exit codes and output
- Returns combined results to the agent
4. Fetch Build Artifacts
Collect platform-specific build outputs:
ah agent followers slurp \
--fileset-data "name=binaries&include=dist/*&dest=./artifacts"Command Adaptation
Agent Harbor automatically adapts commands for each platform:
| Platform | Shell | Path Translation | Example |
|---|---|---|---|
| Linux | bash -lc | Native | cd /workspaces/proj && pytest |
| macOS | zsh -lc | FSKit mount | cd /Volumes/ah-overlays/proj && pytest |
| Windows | PowerShell | Drive mapping | Set-Location S:\; pytest |
Targeting and Selectors
By Tag
# Run on all Windows hosts
ah agent followers run --tag os=windows -- npm test
# Run on ARM64 macOS
ah agent followers run --tag os=macos --tag arch=arm64 -- swift testBy Host
ah agent followers run --host win-12 -- dotnet testRun Everywhere
ah agent followers run --all -- make testTest Sharding
Accelerate test execution by distributing tests across followers:
ah agent followers run \
--command "pytest tests/unit/" \
--command "pytest tests/integration/" \
--command "pytest tests/e2e/"The scheduler:
- Builds a job queue from commands
- Assigns tasks round-robin across executors
- Each command runs on exactly one executor per OS
- Results collected after all complete
Fetching Artifacts
Single File Set
ah agent followers slurp \
--fileset-data "name=binaries&include=dist/*&dest=./artifacts"Multiple File Sets
ah agent followers slurp \
--fileset-data "name=windows-exe&include=target/release/*.exe&required=true" \
--fileset-data "name=mac-app&include=target/release/*.app/**&required=true" \
--fileset-data "name=linux-bin&include=target/release/myapp&required=true" \
--dest ./release-artifactsFrom Configuration
# .agents/config.toml
[filesets.release]
include = ["target/release/*"]
exclude = ["*.d", "*.rlib"]
required = trueah agent followers slurp --fileset release --dest ./releaseConnectivity
Transport Layers
- Control Plane: QUIC connections between executors and access points
- Data Plane: SSH tunneled via HTTP CONNECT (or direct if reachable)
- Sync: Mutagen over SSH for file synchronization
Connection Flow
- Executors register with access point via
ah agent enroll - Leader establishes persistent SSH connections to followers
- Mutagen project started for file sync
- Commands executed over SSH
Firewall Considerations
If followers aren’t directly reachable:
- Use HTTP CONNECT via the access point
- For hybrid fleets, client relays between access points
Time-Travel Integration
Multi-OS testing integrates with Agent Harbor’s time-travel features:
sync-fencecreates a snapshot linked to a SessionMoment- Seeking to that moment restores the leader snapshot
- Followers are re-synced by issuing a fence before re-execution
This enables:
- Reproducing cross-platform failures
- Debugging platform-specific issues from any point in time
Failure Handling
Partial Host Failure
If some hosts fail:
- Results aggregated with per-host status
- Non-zero exit code if any host fails
- Detailed logs available per host
Sync Timeout
If sync fence times out:
- Lagging followers reported
- Suggestions to narrow selectors
- Option to proceed with available hosts
Connectivity Issues
If a follower becomes unreachable:
- Health probes detect degradation
- Automatic re-route attempts
- Host excluded if unrecoverable
Configuration
# .agents/config.toml
[fleet]
name = "production"
[fleet.sync]
# Patterns to ignore during sync
ignores = ["node_modules", ".venv", "target", "build"]
# Sync timeout
timeout = "60s"
[fleet.connectivity]
# SSH connection settings
ssh_connect_timeout = "5s"
ssh_keepalive = "30s"
# Health check interval
health_probe_interval = "30s"
# Consecutive failures before marking degraded
failure_threshold = 3
[fleet.execution]
# Maximum concurrent commands per host
concurrency = 1
# Command timeout
timeout = "10m"
# Retry on transient SSH errors
ssh_retry = true