Enrollment Troubleshooting Guide
Experimental Feature — Executor enrollment and fleet identity management are under active development. Configuration formats and identity provider interfaces may change between releases.
This guide covers common issues encountered during executor enrollment and their solutions.
Quick Diagnostics
Check Enrollment Status
# View executor enrollment state
ah agent status
# Check access point logs
journalctl -u ah-access-point -f
# Check executor logs
journalctl -u ah-executor -fVerify Connectivity
# Test network connectivity to access point
nc -zv access-point.example.com 4433
# Test TLS handshake
openssl s_client -connect access-point.example.com:4433 -showcertsInspect Certificates
# View certificate details
openssl x509 -in cert.pem -noout -text
# Check certificate dates
openssl x509 -in cert.pem -noout -dates
# Verify certificate chain
openssl verify -CAfile ca.pem cert.pemConnection Errors
Connection Refused
error: connection refused to access-point.example.com:4433Possible Causes:
- Access point not running
- Firewall blocking port 4433
- Incorrect address or port
Solutions:
# Check if access point is running
systemctl status ah-access-point
# Check listening ports
ss -tlnp | grep 4433
# Test firewall
sudo iptables -L -n | grep 4433
# Open firewall (Linux)
sudo firewall-cmd --add-port=4433/tcp --permanent
sudo firewall-cmd --reloadConnection Timeout
error: connection timed out after 30sPossible Causes:
- Network routing issues
- Intermediate firewall dropping packets
- DNS resolution problems
Solutions:
# Test DNS resolution
dig access-point.example.com
# Test routing
traceroute access-point.example.com
# Try connecting by IP
ah agent enroll --remote-server https://192.168.1.100:4433 ...TLS Handshake Failed
error: tls handshake failed: certificate verify failedPossible Causes:
- CA certificate mismatch
- Certificate expired
- Certificate not yet valid (clock skew)
Solutions:
# Check system time
date
timedatectl status
# Sync time
sudo systemctl start systemd-timesyncd
# Verify CA matches server certificate
openssl verify -CAfile ca.pem server-cert.pem
# Check certificate dates
openssl x509 -in server-cert.pem -noout -datesCertificate Errors
Certificate Expired
error: certificate has expired or is not yet validDiagnosis:
# Check certificate expiry
openssl x509 -in cert.pem -noout -enddate
# Compare with current time
date -uSolutions:
-
Files provider: Generate new certificates
# See files provider guide for certificate generation -
SPIFFE provider: Check SPIRE agent
spire-agent healthcheck -socketPath /run/spire/agent.sock -
Vault provider: Check Vault connectivity
vault token lookup
Certificate Chain Incomplete
error: unable to get local issuer certificateDiagnosis:
# View certificate chain
openssl s_client -connect access-point:4433 -showcerts
# Check CA file contents
openssl x509 -in ca.pem -noout -subject -issuerSolutions:
# Concatenate intermediate and root CAs
cat intermediate-ca.pem root-ca.pem > ca-chain.pem
# Use complete chain
ah agent enroll --ca ca-chain.pem ...Wrong Key for Certificate
error: private key does not match certificateDiagnosis:
# Compare certificate and key modulus
openssl x509 -in cert.pem -noout -modulus | md5sum
openssl rsa -in key.pem -noout -modulus | md5sum
# These should matchSolution: Regenerate certificate and key pair together.
Certificate SAN Mismatch
error: certificate SAN does not match expected patternDiagnosis:
# View certificate SANs
openssl x509 -in cert.pem -noout -text | grep -A1 "Subject Alternative Name"Solutions:
- Regenerate certificate with correct SANs
- Update access point
--executor-san-uri-prefixto match - For SPIFFE, ensure registration entry uses correct SPIFFE ID
Identity Provider Errors
Files Provider
Permission Denied
error: permission denied reading /etc/ah/key.pemSolution:
# Fix permissions
sudo chown agent-harbor:agent-harbor /etc/ah/*.pem
sudo chmod 600 /etc/ah/*-key.pem
sudo chmod 644 /etc/ah/cert.pem /etc/ah/ca.pemFile Not Found
error: no such file: /etc/ah/cert.pemSolution: Verify paths and file existence:
ls -la /etc/ah/SPIFFE Provider
No SVID Issued
error: no identity issuedDiagnosis:
# Check agent health
spire-agent healthcheck -socketPath /run/spire/agent.sock
# List available SVIDs
spire-agent api fetch x509 -socketPath /run/spire/agent.sock -write /tmp/svid
# Check registration entries
spire-server entry show -socketPath /run/spire/server.sockSolutions:
-
Create registration entry:
spire-server entry create \ -socketPath /run/spire/server.sock \ -parentID "spiffe://example.org/spire/agent/join_token/agent-1" \ -spiffeID "spiffe://example.org/ah/agent/executor-1" \ -selector "unix:user:executor" -
Fix selector mismatch:
# Check process UID/GID id # Verify selector matches spire-server entry show -socketPath /run/spire/server.sock | grep selector
SPIFFE Socket Not Found
error: failed to connect to Workload API: /run/spire/agent.sock: no such fileSolutions:
# Check SPIRE agent is running
systemctl status spire-agent
# Verify socket path
ls -la /run/spire/
# Check socket permissions
stat /run/spire/agent.sockSPIFFE ID Mismatch
error: server SPIFFE ID mismatch: expected spiffe://example.org/ah/serve, got spiffe://other.org/ah/serveSolutions:
- Verify
--expected-server-idmatches access point’s SPIFFE ID - Check trust domain configuration on both sides
- Verify registration entry for access point
Vault Provider
Authentication Failed
error: vault authentication failed: permission deniedDiagnosis:
# Test Vault authentication
vault login -method=approle \
role_id=$VAULT_ROLE_ID \
secret_id=$VAULT_SECRET_IDSolutions:
- Verify role ID and secret ID are correct
- Check secret ID hasn’t expired
- Verify AppRole is enabled:
vault auth list
PKI Issue Failed
error: failed to issue certificate: 1 error occurred: * common name not allowedDiagnosis:
# Check PKI role configuration
vault read pki_int/roles/executorSolutions:
# Update allowed domains
vault write pki_int/roles/executor \
allowed_domains="executor.example.com,internal.example.com" \
allow_subdomains=trueVault Sealed
error: vault is sealedSolution: Unseal Vault:
vault operator unseal <key1>
vault operator unseal <key2>
vault operator unseal <key3>mTLS Errors
Client Certificate Required
error: client certificate requiredDiagnosis: Access point requires client certificate but executor isn’t providing one.
Solutions:
- Verify executor identity provider is configured
- Check certificate is being loaded:
ah agent enroll --identity files --cert cert.pem --key key.pem --ca ca.pem ...
Client Certificate Rejected
error: client certificate rejected: certificate signed by unknown authoritySolutions:
-
Access point must trust executor’s CA:
ah agent access-point --ca /path/to/executor-ca.pem ... -
Or use same CA for both access point and executors
Server Certificate Rejected
error: x509: certificate signed by unknown authoritySolutions:
-
Executor must trust access point’s CA:
ah agent enroll --ca /path/to/access-point-ca.pem ... -
For SPIFFE, the CA is provided by SPIRE automatically
Rotation Issues
Rotation Not Happening
Diagnosis:
# Check certificate expiry
ah agent status --show-cert
# Check for rotation logs
journalctl -u ah-executor | grep -i "rotat\|renew"Solutions:
-
Files provider: Ensure file watching is working
# Trigger inotify event touch /etc/ah/cert.pem -
SPIFFE provider: Check SPIRE agent health
spire-agent healthcheck -socketPath /run/spire/agent.sock -
Vault provider: Check Vault token is valid
vault token lookup
Connection Drops During Rotation
Possible Causes:
- Rotation happens too late (near expiry)
- Server doesn’t accept new certificate
Solutions:
-
Configure earlier rotation threshold:
ah agent enroll --vault-renewal-threshold 0.5 ... # Renew at 50% TTL -
Use longer certificate TTLs to allow more time for rotation
Debugging Tools
Enable Debug Logging
# Access point
ah agent access-point --log-level debug ...
# Executor
ah agent enroll --log-level debug ...
# Or via environment
export RUST_LOG=ah_identity_provider=debug,ah_cli=debugCapture TLS Traffic
# Capture with tcpdump
sudo tcpdump -i any -w enrollment.pcap port 4433
# Analyze with Wireshark
wireshark enrollment.pcapTest Certificate Chain
# Full certificate validation
openssl s_client -connect access-point:4433 \
-cert executor.pem \
-key executor-key.pem \
-CAfile ca.pem \
-verify_return_errorSPIRE Debugging
# Agent debug info
spire-agent api fetch x509 \
-socketPath /run/spire/agent.sock \
-write /tmp/svid
# Detailed SVID info
openssl x509 -in /tmp/svid.0.pem -noout -text
# Server-side registration check
spire-server entry show -socketPath /run/spire/server.sock
# Agent list
spire-server agent list -socketPath /run/spire/server.sockCommon Patterns
Development Setup Failing
For quick local development:
# Use dev identity (self-signed)
ah agent access-point --fleet-listen 127.0.0.1:4433
# Executor with dev identity
ah agent enroll --remote-server https://127.0.0.1:4433 --name testProduction Checklist
Before deploying to production, verify:
- Certificates are issued by trusted CA
- Certificate TTLs are appropriately short (hours, not days)
- Certificate rotation is working
- Access point validates executor certificate SANs
- Firewall allows port 4433
- Audit logging is enabled
- Monitoring alerts for certificate expiry
Getting Help
If you’re still experiencing issues:
- Collect debug logs from both access point and executor
- Capture certificate details with
openssl x509 -noout -text - Note the exact error message
- Open an issue at https://github.com/schelling-point-labs/agent-harbor/issues
Next Steps
- Files Provider Guide - Manual PKI setup
- SPIFFE Deployment Guide - SPIRE configuration
- Vault Integration Guide - Enterprise PKI