Health Checks

Monitoring the Keeper Gateway using health checks

Overview

This document describes the health check functionality implemented for the KeeperPAM Gateway. Health checks provide essential monitoring capabilities that solve several common operational challenges. Health checks require the Keeper Gateway 1.5.5 or newer.

Docker Container Lifecycle Management

Problem: When a gateway goes offline or loses connection, Docker containers continue running and report as "healthy" even though the gateway inside is not functioning properly. This creates misleading container status and prevents proper automated recovery.

Solution: With health checks enabled, Docker can properly monitor the gateway's actual operational status and automatically restart or terminate containers when the gateway is unhealthy.

Load Balancer Integration

Problem: Load balancers need to know which gateway instances are actively handling connections to route traffic appropriately.

Solution: Health check endpoints allow load balancers to automatically remove unhealthy instances from rotation and add them back when they recover.

Monitoring and Alerting

Problem: Operations teams need real-time visibility into gateway status across multiple deployments and environments.

Solution: Health checks integrate with monitoring systems (Prometheus, Nagios, Datadog, etc.) to provide automated alerting and dashboards showing gateway health across your infrastructure.

Service Orchestration

Problem: In Kubernetes or other orchestration platforms, services need to know when gateways are ready to accept connections and when they should be restarted.

Solution: Health checks provide the necessary endpoints for readiness probes, liveness probes, and automated restart policies.

Automated Recovery

Problem: Manual intervention is required to detect and restart failed gateway instances.

Solution: Health checks enable automated monitoring scripts and orchestration tools to detect failures and trigger recovery procedures without human intervention.


Quick Start / Common Issues

Step 1: Start the gateway with health check enabled:

gateway start --health-check

Step 2: Only after the gateway is running with health check enabled, you can check its health:

gateway health-check

If you get an error like "Could not connect to health check server", it means you haven't completed Step 1.

If you see "Exception No such command 'keeper-gateway.exe'", you're using the wrong command syntax. Always use gateway as the command name.


Complete Configuration Examples

Here are comprehensive examples showing how to start the gateway and check its health in different configurations:

Configuration
Start Command
CLI Health Check
Curl Health Check

Basic HTTP

gateway start --health-check

gateway health-check

curl http://127.0.0.1:8099/health

HTTP with Auth

gateway start --health-check --health-check-auth-token mytoken

gateway health-check --token mytoken

curl -H "Authorization: Bearer mytoken" http://127.0.0.1:8099/health

HTTPS (SSL)

gateway start --health-check --health-check-ssl --health-check-ssl-cert /path/cert.pem --health-check-ssl-key /path/key.pem

gateway health-check --ssl

curl -k https://127.0.0.1:8099/health

HTTPS with Auth

gateway start --health-check --health-check-ssl --health-check-ssl-cert /path/cert.pem --health-check-ssl-key /path/key.pem --health-check-auth-token mytoken

gateway health-check --ssl --token mytoken

curl -k -H "Authorization: Bearer mytoken" https://127.0.0.1:8099/health

Custom Port

gateway start --health-check --health-check-port 8443

gateway health-check --port 8443

curl http://127.0.0.1:8443/health

Custom Host

gateway start --health-check --health-check-host 0.0.0.0

gateway health-check --host 0.0.0.0

curl http://0.0.0.0:8099/health

Production Setup

gateway start --health-check --health-check-host 0.0.0.0 --health-check-port 8443 --health-check-ssl --health-check-ssl-cert /etc/ssl/cert.pem --health-check-ssl-key /etc/ssl/key.pem --health-check-auth-token $(cat /etc/secrets/token)

gateway health-check --host 0.0.0.0 --port 8443 --ssl --token $(cat /etc/secrets/token)

curl -k -H "Authorization: Bearer $(cat /etc/secrets/token)" https://0.0.0.0:8443/health

Output Format Examples

Output Format
CLI Command
Description

Simple Status

gateway health-check --ssl --token mytoken

Returns OK: Gateway is running and connected (exit code 0) or CRITICAL: ... (exit code 1)

Detailed Info

gateway health-check --ssl --token mytoken --info

Key=value pairs suitable for monitoring scripts

JSON Format

gateway health-check --ssl --token mytoken --json

Full JSON response matching HTTP endpoint

Troubleshooting Commands

Issue
Test Command
Expected Result

Check if server is running

curl http://127.0.0.1:8099/health

Connection success or "Connection refused"

Test SSL connectivity

curl -k https://127.0.0.1:8099/health

SSL handshake success or SSL error

Test authentication

curl -k -H "Authorization: Bearer wrongtoken" https://127.0.0.1:8099/health

{"error": "Invalid authentication token"}

Check server binding

curl http://0.0.0.0:8099/health

Success if bound to 0.0.0.0, failure if bound to 127.0.0.1

Error Messages and Troubleshooting

The CLI health check provides detailed error messages to help diagnose issues:

Authentication Errors (HTTP 401)

CRITICAL: Authentication failed when connecting to https://127.0.0.1:8099/health
ERROR: Invalid or missing authentication token.

Possible fixes:
1. Check if auth token is required:
   curl -k https://127.0.0.1:8099/health
2. Provide the correct auth token:
   gateway health-check --ssl --token YOUR_TOKEN
3. Check gateway startup logs for the configured token

Connection Errors

CRITICAL: Could not connect to health check server at http://127.0.0.1:8099/health
ERROR: Connection failed.

Possible causes:
1. Health check server is not running
2. Wrong host/port combination
3. Network connectivity issues
4. SSL/non-SSL mismatch

Troubleshooting steps:
1. Verify gateway is running with health check enabled:
   gateway start --health-check
2. Check if server is using SSL:
   gateway health-check --ssl
3. Verify host and port:
   Current: 127.0.0.1:8099
4. Test with curl:
   curl http://127.0.0.1:8099/health

SSL Certificate Errors

CRITICAL: SSL error connecting to health check server at https://127.0.0.1:8099/health
ERROR: SSL certificate validation failed.

Possible causes:
- Self-signed certificate (try curl with -k flag)
- Invalid certificate path
- Certificate expired

Implementation

The Gateway health check is implemented using Bottle, a lightweight WSGI micro web-framework for Python. Bottle was chosen for the following advantages:

  • Minimal dependency (single file, ~60KB in size)

  • Enhanced security over the built-in Python HTTP server

  • Proper request routing and handling

  • Better error management

  • Thread safety

  • Production-ready with minimal overhead

CLI Health Check

You can check the gateway's health from the command line:

gateway health-check

This command returns:

  • Exit code 0 if the gateway is healthy

  • Exit code 1 if the gateway is not running or not healthy

  • Text output indicating the status (OK/CRITICAL/WARNING)

For detailed output in a machine-parsable format (one key=value pair per line):

gateway health-check -i

For JSON format output (matching the HTTP endpoint format):

gateway health-check -j

If your health check server is using SSL:

gateway health-check --ssl

If your health check server requires authentication:

gateway health-check --ssl --token your_auth_token

If your health check server is running on a non-default port:

gateway health-check --port 8123

If your health check server is running on a different host:

gateway health-check --host 10.0.0.5

The detailed output includes:

  • Gateway version

  • Connection status

  • WebSocket metrics (when available)

  • Process information (in background mode)

This makes it suitable for monitoring scripts and cron jobs.

Note: The CLI health check command requires the HTTP health check server to be running. If the health check server is not running, the command will return an error message suggesting to enable the health check server.


HTTP Health Check

The gateway includes a secure HTTP health check endpoint that can be enabled with environment variables or command line arguments.

Configuration

The health check server can be configured using environment variables or command line arguments:

Environment Variables

Environment Variable
Purpose
Default

KEEPER_GATEWAY_HEALTH_CHECK_ENABLED

Enable HTTP health check (1, true, yes)

Disabled

KEEPER_GATEWAY_HEALTH_CHECK_PORT

Port for HTTP server

8099

KEEPER_GATEWAY_HEALTH_CHECK_HOST

Host address to bind to

127.0.0.1

KEEPER_GATEWAY_HEALTH_CHECK_AUTH_TOKEN

Authentication token for requests

None

KEEPER_GATEWAY_HEALTH_CHECK_USE_SSL

Enable SSL (1, true, yes)

Disabled

KEEPER_GATEWAY_HEALTH_CHECK_SSL_CERT

Path to SSL certificate

None

KEEPER_GATEWAY_HEALTH_CHECK_SSL_KEY

Path to SSL private key

None

Command Line Arguments

When starting the gateway, you can also use these command line arguments:

--health-check               Enable the health check server
--health-check-port INT      Port for the health check server (default: 8099)
--health-check-host STRING   Host address to bind to (default: 127.0.0.1)
--health-check-auth-token    Auth token for the health check server
--health-check-ssl           Enable SSL for the health check server
--health-check-ssl-cert      Path to SSL certificate
--health-check-ssl-key       Path to SSL private key

Command line arguments take precedence over environment variables when both are specified.

Example Commands

Basic health check with default settings:

gateway start --health-check

Custom port and authentication token:

gateway start --health-check --health-check-port 9000 --health-check-auth-token mysecrettoken

Bind to all interfaces (only in secure environments):

gateway start --health-check --health-check-host 0.0.0.0

Enable SSL with certificate and key:

gateway start --health-check --health-check-ssl --health-check-ssl-cert /path/to/cert.pem --health-check-ssl-key /path/to/key.pem

Complete example with all options:

gateway start --health-check --health-check-port 8443 --health-check-host 10.0.0.5 --health-check-auth-token mysecrettoken --health-check-ssl --health-check-ssl-cert /path/to/cert.pem --health-check-ssl-key /path/to/key.pem

Usage

When enabled, the HTTP health check endpoint will be available at:

http://localhost:8099/health

Or with SSL:

https://localhost:8099/health

Response Format

The endpoint returns:

  • HTTP 200 if the gateway is healthy

  • HTTP 503 if the gateway is not healthy

  • JSON response with details:

{
  "status": "healthy",
  "message": "Gateway is running and connected",
  "details": {
    "timestamp": 1742849941,
    "version": 1,
    "connection_status": "connected",
    "websocket": {
      "uptime_seconds": 85,
      "uptime_human": "1m 25s",
      "last_ping_received_seconds_ago": 10,
      "latency_ms": 75,
      "last_ping_sent_timestamp": 1742850455,
      "last_pong_received_timestamp": 1742850455
    }
  }
}

The response includes:

  • status: Overall health status ("healthy" or "unhealthy")

  • message: Human-readable description of the status

  • details: Detailed information about the gateway

    • timestamp: Current server timestamp

    • version: API version

    • connection_status: Current connection status ("connected", "disconnected", etc.)

    • websocket: WebSocket connection metrics

      • uptime_seconds: WebSocket connection uptime in seconds

      • uptime_human: Human-readable uptime (e.g., "1m 25s")

      • last_ping_received_seconds_ago: Seconds since the last ping was received

      • latency_ms: Round-trip latency of the last ping-pong in milliseconds

      • last_ping_sent_timestamp: Unix timestamp when the last ping was sent

      • last_pong_received_timestamp: Unix timestamp when the last pong was received

Example Responses

Healthy Gateway:

{
  "status": "healthy",
  "message": "Gateway is running and connected",
  "details": {
    "timestamp": 1742849941,
    "version": 1,
    "connection_status": "connected",
    "websocket": {
      "uptime_seconds": 85,
      "uptime_human": "1m 25s",
      "last_ping_received_seconds_ago": 10,
      "latency_ms": 75,
      "last_ping_sent_timestamp": 1742850455,
      "last_pong_received_timestamp": 1742850455
    }
  }
}

Unhealthy Gateway:

{
  "status": "unhealthy",
  "message": "Gateway is not properly connected (status: reconnecting)",
  "details": {
    "timestamp": 1742850874,
    "version": 1,
    "connection_status": "reconnecting",
    "websocket": {
      "uptime_seconds": 1018,
      "uptime_human": "16m 58s",
      "last_ping_received_seconds_ago": 324,
      "latency_ms": 77
    }
  }
}

Note that some metrics like latency_ms, last_ping_sent_timestamp, and last_pong_received_timestamp may not always be present in the response. The availability of these metrics depends on the current state of the WebSocket connection and the timing of ping/pong messages.

Status Update Delays

The health check reflects the current state of the WebSocket connection, but there may be a delay in status updates.

Delayed Status Updates

When connectivity is lost, it may take up to 2 minutes for the health check to report an "unhealthy" status, as the gateway attempts to reconnect. Similarly, when connectivity is restored, it may take up to 2 minutes for the health check to reflect a "healthy" status.

This latency is intentional and allows the gateway to attempt reconnection without immediately reporting failures for transient connectivity issues.

Security

The HTTP health check includes the following security features:

  1. Authentication: When KEEPER_GATEWAY_HEALTH_CHECK_AUTH_TOKEN is set, requests must include the token in the Authorization header:

    Authorization: Bearer <token>
  2. SSL/TLS: When SSL is enabled, all communication is encrypted. You must provide a valid certificate and private key.

  3. Localhost binding: The server binds to localhost only by default, not exposing the endpoint over the network.

  4. Security Headers: The health check server adds the following security headers to responses:

    • X-Content-Type-Options: nosniff

    • X-Frame-Options: DENY

    • Content-Security-Policy: default-src 'none'

  5. Rate Limiting: Automatic rate limiting is applied to non-localhost connections (60 requests per minute per IP).

  6. Information Protection: When the server is bound to a non-localhost address, sensitive information is automatically redacted from the response.

  7. Forced SSL: SSL is automatically enforced when binding to non-localhost interfaces.

TLS Compatibility

The health check server is configured to support a wide range of clients by:

  • Using secure TLS defaults (TLS 1.2+ minimum) for maximum security

  • Supporting modern cipher suites for strong encryption

  • Automatically handling protocol negotiation for HTTP and HTTPS

For clients that support modern TLS versions, use standard curl commands:

curl -k -H "Authorization: Bearer your_token" https://localhost:8099/health

Docker-Specific Configuration Requirements

When running Keeper Gateway inside Docker, special configuration is required to expose the health check to the host or external systems:

Binding to 0.0.0.0

  • The health check server must bind to 0.0.0.0 to be reachable outside the container.

  • 127.0.0.1 restricts access to within the container only.

SSL Enforcement

  • When using 0.0.0.0, Keeper Gateway forces SSL to protect health check data.

  • You must provide a valid certificate and key or the server will not start.

Authentication Requirement

  • If binding to 0.0.0.0, you must also specify an AUTH_TOKEN to secure the endpoint.

Docker Compose Example

services:
  keeper-gateway:
    image: keeper/gateway:latest
    ports:
      - "8099:8099"
    volumes:
      - ./certs:/certs:ro
    environment:
      KEEPER_GATEWAY_HEALTH_CHECK_ENABLED: true
      KEEPER_GATEWAY_HEALTH_CHECK_HOST: "0.0.0.0"
      KEEPER_GATEWAY_HEALTH_CHECK_PORT: 8099
      KEEPER_GATEWAY_HEALTH_CHECK_USE_SSL: true
      KEEPER_GATEWAY_HEALTH_CHECK_SSL_CERT: /certs/healthcheck.crt
      KEEPER_GATEWAY_HEALTH_CHECK_SSL_KEY: /certs/healthcheck.key
      KEEPER_GATEWAY_HEALTH_CHECK_AUTH_TOKEN: mysecrettoken

Generate a Self-Signed Certificate

mkdir -p certs
openssl req -x509 -nodes -days 365 \
  -newkey rsa:2048 \
  -keyout certs/healthcheck.key \
  -out certs/healthcheck.crt \
  -subj "/CN=localhost"

Test the Endpoint from the Host

curl -k -H "Authorization: Bearer mysecrettoken" https://localhost:8099/health

Example Linux Configuration

# Enable HTTP health check
export KEEPER_GATEWAY_HEALTH_CHECK_ENABLED=true
export KEEPER_GATEWAY_HEALTH_CHECK_PORT=8099
export KEEPER_GATEWAY_HEALTH_CHECK_AUTH_TOKEN=mysecrettoken

# Start the gateway
gateway start

Or using command line arguments:

gateway start --health-check --health-check-port 8099 --health-check-auth-token mysecrettoken

Self-Signed SSL Certificates

For testing or internal use, you can generate self-signed certificates to enable SSL/TLS encryption:

# Generate a private key
openssl genrsa -out healthcheck.key 2048

# Generate a certificate signing request (CSR)
openssl req -new -key healthcheck.key -out healthcheck.csr -subj "/CN=localhost"

# Generate a self-signed certificate (valid for 365 days)
openssl x509 -req -days 365 -in healthcheck.csr -signkey healthcheck.key -out healthcheck.crt

# Set the environment variables
export KEEPER_GATEWAY_HEALTH_CHECK_ENABLED=true
export KEEPER_GATEWAY_HEALTH_CHECK_USE_SSL=true
export KEEPER_GATEWAY_HEALTH_CHECK_SSL_CERT=/path/to/healthcheck.crt
export KEEPER_GATEWAY_HEALTH_CHECK_SSL_KEY=/path/to/healthcheck.key
export KEEPER_GATEWAY_HEALTH_CHECK_PORT=8443  # Typical HTTPS port
export KEEPER_GATEWAY_HEALTH_CHECK_AUTH_TOKEN=mysecrettoken

# Start the gateway
gateway start

Or using command line arguments:

gateway --health-check --health-check-port 8443 --health-check-ssl --health-check-ssl-cert /path/to/healthcheck.crt --health-check-ssl-key /path/to/healthcheck.key --health-check-auth-token mysecrettoken start

When using self-signed certificates, your HTTP client will need to be configured to trust the certificate or ignore SSL verification (not recommended for production).

Monitoring Integration

This endpoint can be used with monitoring systems like:

  • Prometheus with blackbox exporter

  • Nagios/Icinga

  • Zabbix

  • Datadog

  • AWS CloudWatch

  • Any monitoring system that can perform HTTP checks

Last updated

Was this helpful?