Skip to Content
GuidesWorkflowsEnrollmentEnrollment Troubleshooting Guide

Enrollment Troubleshooting Guide

Experimental Feature — Executor enrollment and fleet identity management are under active development. Configuration formats and identity provider interfaces may change between releases.

This guide covers common issues encountered during executor enrollment and their solutions.

Quick Diagnostics

Check Enrollment Status

# View executor enrollment state ah agent status # Check access point logs journalctl -u ah-access-point -f # Check executor logs journalctl -u ah-executor -f

Verify Connectivity

# Test network connectivity to access point nc -zv access-point.example.com 4433 # Test TLS handshake openssl s_client -connect access-point.example.com:4433 -showcerts

Inspect Certificates

# View certificate details openssl x509 -in cert.pem -noout -text # Check certificate dates openssl x509 -in cert.pem -noout -dates # Verify certificate chain openssl verify -CAfile ca.pem cert.pem

Connection Errors

Connection Refused

error: connection refused to access-point.example.com:4433

Possible Causes:

  1. Access point not running
  2. Firewall blocking port 4433
  3. Incorrect address or port

Solutions:

# Check if access point is running systemctl status ah-access-point # Check listening ports ss -tlnp | grep 4433 # Test firewall sudo iptables -L -n | grep 4433 # Open firewall (Linux) sudo firewall-cmd --add-port=4433/tcp --permanent sudo firewall-cmd --reload

Connection Timeout

error: connection timed out after 30s

Possible Causes:

  1. Network routing issues
  2. Intermediate firewall dropping packets
  3. DNS resolution problems

Solutions:

# Test DNS resolution dig access-point.example.com # Test routing traceroute access-point.example.com # Try connecting by IP ah agent enroll --remote-server https://192.168.1.100:4433 ...

TLS Handshake Failed

error: tls handshake failed: certificate verify failed

Possible Causes:

  1. CA certificate mismatch
  2. Certificate expired
  3. Certificate not yet valid (clock skew)

Solutions:

# Check system time date timedatectl status # Sync time sudo systemctl start systemd-timesyncd # Verify CA matches server certificate openssl verify -CAfile ca.pem server-cert.pem # Check certificate dates openssl x509 -in server-cert.pem -noout -dates

Certificate Errors

Certificate Expired

error: certificate has expired or is not yet valid

Diagnosis:

# Check certificate expiry openssl x509 -in cert.pem -noout -enddate # Compare with current time date -u

Solutions:

  1. Files provider: Generate new certificates

    # See files provider guide for certificate generation
  2. SPIFFE provider: Check SPIRE agent

    spire-agent healthcheck -socketPath /run/spire/agent.sock
  3. Vault provider: Check Vault connectivity

    vault token lookup

Certificate Chain Incomplete

error: unable to get local issuer certificate

Diagnosis:

# View certificate chain openssl s_client -connect access-point:4433 -showcerts # Check CA file contents openssl x509 -in ca.pem -noout -subject -issuer

Solutions:

# Concatenate intermediate and root CAs cat intermediate-ca.pem root-ca.pem > ca-chain.pem # Use complete chain ah agent enroll --ca ca-chain.pem ...

Wrong Key for Certificate

error: private key does not match certificate

Diagnosis:

# Compare certificate and key modulus openssl x509 -in cert.pem -noout -modulus | md5sum openssl rsa -in key.pem -noout -modulus | md5sum # These should match

Solution: Regenerate certificate and key pair together.

Certificate SAN Mismatch

error: certificate SAN does not match expected pattern

Diagnosis:

# View certificate SANs openssl x509 -in cert.pem -noout -text | grep -A1 "Subject Alternative Name"

Solutions:

  1. Regenerate certificate with correct SANs
  2. Update access point --executor-san-uri-prefix to match
  3. For SPIFFE, ensure registration entry uses correct SPIFFE ID

Identity Provider Errors

Files Provider

Permission Denied

error: permission denied reading /etc/ah/key.pem

Solution:

# Fix permissions sudo chown agent-harbor:agent-harbor /etc/ah/*.pem sudo chmod 600 /etc/ah/*-key.pem sudo chmod 644 /etc/ah/cert.pem /etc/ah/ca.pem

File Not Found

error: no such file: /etc/ah/cert.pem

Solution: Verify paths and file existence:

ls -la /etc/ah/

SPIFFE Provider

No SVID Issued

error: no identity issued

Diagnosis:

# Check agent health spire-agent healthcheck -socketPath /run/spire/agent.sock # List available SVIDs spire-agent api fetch x509 -socketPath /run/spire/agent.sock -write /tmp/svid # Check registration entries spire-server entry show -socketPath /run/spire/server.sock

Solutions:

  1. Create registration entry:

    spire-server entry create \ -socketPath /run/spire/server.sock \ -parentID "spiffe://example.org/spire/agent/join_token/agent-1" \ -spiffeID "spiffe://example.org/ah/agent/executor-1" \ -selector "unix:user:executor"
  2. Fix selector mismatch:

    # Check process UID/GID id # Verify selector matches spire-server entry show -socketPath /run/spire/server.sock | grep selector

SPIFFE Socket Not Found

error: failed to connect to Workload API: /run/spire/agent.sock: no such file

Solutions:

# Check SPIRE agent is running systemctl status spire-agent # Verify socket path ls -la /run/spire/ # Check socket permissions stat /run/spire/agent.sock

SPIFFE ID Mismatch

error: server SPIFFE ID mismatch: expected spiffe://example.org/ah/serve, got spiffe://other.org/ah/serve

Solutions:

  1. Verify --expected-server-id matches access point’s SPIFFE ID
  2. Check trust domain configuration on both sides
  3. Verify registration entry for access point

Vault Provider

Authentication Failed

error: vault authentication failed: permission denied

Diagnosis:

# Test Vault authentication vault login -method=approle \ role_id=$VAULT_ROLE_ID \ secret_id=$VAULT_SECRET_ID

Solutions:

  1. Verify role ID and secret ID are correct
  2. Check secret ID hasn’t expired
  3. Verify AppRole is enabled: vault auth list

PKI Issue Failed

error: failed to issue certificate: 1 error occurred: * common name not allowed

Diagnosis:

# Check PKI role configuration vault read pki_int/roles/executor

Solutions:

# Update allowed domains vault write pki_int/roles/executor \ allowed_domains="executor.example.com,internal.example.com" \ allow_subdomains=true

Vault Sealed

error: vault is sealed

Solution: Unseal Vault:

vault operator unseal <key1> vault operator unseal <key2> vault operator unseal <key3>

mTLS Errors

Client Certificate Required

error: client certificate required

Diagnosis: Access point requires client certificate but executor isn’t providing one.

Solutions:

  1. Verify executor identity provider is configured
  2. Check certificate is being loaded:
    ah agent enroll --identity files --cert cert.pem --key key.pem --ca ca.pem ...

Client Certificate Rejected

error: client certificate rejected: certificate signed by unknown authority

Solutions:

  1. Access point must trust executor’s CA:

    ah agent access-point --ca /path/to/executor-ca.pem ...
  2. Or use same CA for both access point and executors

Server Certificate Rejected

error: x509: certificate signed by unknown authority

Solutions:

  1. Executor must trust access point’s CA:

    ah agent enroll --ca /path/to/access-point-ca.pem ...
  2. For SPIFFE, the CA is provided by SPIRE automatically

Rotation Issues

Rotation Not Happening

Diagnosis:

# Check certificate expiry ah agent status --show-cert # Check for rotation logs journalctl -u ah-executor | grep -i "rotat\|renew"

Solutions:

  1. Files provider: Ensure file watching is working

    # Trigger inotify event touch /etc/ah/cert.pem
  2. SPIFFE provider: Check SPIRE agent health

    spire-agent healthcheck -socketPath /run/spire/agent.sock
  3. Vault provider: Check Vault token is valid

    vault token lookup

Connection Drops During Rotation

Possible Causes:

  1. Rotation happens too late (near expiry)
  2. Server doesn’t accept new certificate

Solutions:

  1. Configure earlier rotation threshold:

    ah agent enroll --vault-renewal-threshold 0.5 ... # Renew at 50% TTL
  2. Use longer certificate TTLs to allow more time for rotation

Debugging Tools

Enable Debug Logging

# Access point ah agent access-point --log-level debug ... # Executor ah agent enroll --log-level debug ... # Or via environment export RUST_LOG=ah_identity_provider=debug,ah_cli=debug

Capture TLS Traffic

# Capture with tcpdump sudo tcpdump -i any -w enrollment.pcap port 4433 # Analyze with Wireshark wireshark enrollment.pcap

Test Certificate Chain

# Full certificate validation openssl s_client -connect access-point:4433 \ -cert executor.pem \ -key executor-key.pem \ -CAfile ca.pem \ -verify_return_error

SPIRE Debugging

# Agent debug info spire-agent api fetch x509 \ -socketPath /run/spire/agent.sock \ -write /tmp/svid # Detailed SVID info openssl x509 -in /tmp/svid.0.pem -noout -text # Server-side registration check spire-server entry show -socketPath /run/spire/server.sock # Agent list spire-server agent list -socketPath /run/spire/server.sock

Common Patterns

Development Setup Failing

For quick local development:

# Use dev identity (self-signed) ah agent access-point --fleet-listen 127.0.0.1:4433 # Executor with dev identity ah agent enroll --remote-server https://127.0.0.1:4433 --name test

Production Checklist

Before deploying to production, verify:

  • Certificates are issued by trusted CA
  • Certificate TTLs are appropriately short (hours, not days)
  • Certificate rotation is working
  • Access point validates executor certificate SANs
  • Firewall allows port 4433
  • Audit logging is enabled
  • Monitoring alerts for certificate expiry

Getting Help

If you’re still experiencing issues:

  1. Collect debug logs from both access point and executor
  2. Capture certificate details with openssl x509 -noout -text
  3. Note the exact error message
  4. Open an issue at https://github.com/schelling-point-labs/agent-harbor/issues 

Next Steps