Enable and Start the Watchdog Service

Example policy for configuring the watchdog service

This example shows how to use a single Update Settings policy to configure the Watchdog service settings and deploy a job that starts it as an OS service — all in one policy push.

Overview

The Watchdog is a standalone monitoring service that keeps KeeperPrivilegeManager healthy. It runs as a separate OS service and periodically probes the /health endpoint, restarting the main service automatically if it becomes unresponsive.

Activating the Watchdog normally requires two separate policies: a SettingsUpdate to write the Watchdog configuration into appsettings.json, and a JobUpdate to start the OS service. The ConfigurationUpdate policy type combines both into a single policy. When both a settings payload and a job payload are present, the processor applies the settings first, then creates and fires the job — so the configuration is already on disk by the time the Watchdog service reads it.

The Policy

{
  "PolicyId": "policy-enable-watchdog",
  "PolicyName": "Enable and Start Watchdog",
  "PolicyType": "ConfigurationUpdate",
  "Status": "enabled",
  "Operator": "And",
  "Actions": {
    "OnSuccess": { "Controls": [] },
    "OnFailure": { "Command": "deny" }
  },
  "Rules": [],
  "Extension": {
    "TargetFile": "appsettings.json",
    "Action": "Update",
    "SettingsJson": {
      "Watchdog": {
        "CheckIntervalSec": 10,
        "AutoRemediate": true,
        "StartupDelaySec": 90
      }
    },
    "JobId": "start-watchdog-service",
    "JobJson": {
      "id": "start-watchdog-service",
      "name": "Start Watchdog Service",
      "description": "Starts the KeeperWatchdog OS service on the endpoint.",
      "enabled": true,
      "asUser": false,
      "priority": 9,
      "events": [
        {
          "eventType": "Custom",
          "customEvent": "PolicyPreprocessingCompleted"
        }
      ],
      "parameters": [],
      "tasks": [
        {
          "id": "start-watchdog-windows",
          "name": "Start Watchdog on Windows",
          "command": "sc",
          "arguments": "start KeeperWatchdog",
          "expectedExitCode": 0,
          "timeoutSeconds": 30,
          "executionType": "Service",
          "onFailure": "start-watchdog-unix"
        },
        {
          "id": "start-watchdog-unix",
          "name": "Start Watchdog on Linux/macOS",
          "command": "systemctl",
          "arguments": "start keeper-watchdog",
          "expectedExitCode": 0,
          "timeoutSeconds": 30,
          "executionType": "Service"
        }
      ],
      "osFilter": {
        "windows": true,
        "linux": true,
        "macOS": true
      }
    }
  }
}

What Each Part Does

Settings portion — The TargetFile and SettingsJson fields target appsettings.json and merge a Watchdog configuration section into it. The Update action preserves all existing keys in the file; only the keys present in SettingsJson are added or overwritten.

The Watchdog section supports three optional keys:

Key
Type
Default
Description

CheckIntervalSec

int

10

Seconds between health checks. Clamped to 2–300.

AutoRemediate

bool

true

When true, the Watchdog restarts the service on failure. When false, monitoring only.

StartupDelaySec

int

90

Seconds to wait after the Watchdog starts before the first health check. Minimum 30.

Job portion — The JobId and JobJson fields write a job file to Jobs/start-watchdog-service.json and trigger a job reload. The job fires on the PolicyPreprocessingCompleted event, which means it runs as part of the same sync cycle that delivers this policy. The job contains two tasks with fallback: sc start KeeperWatchdog for Windows, falling through to systemctl start keeper-watchdog for Linux and macOS if the first task fails.

What Happens on the Endpoint

When the policy is received, the following sequence runs:

Variations

Monitoring only (no auto-restart) Set AutoRemediate to false. The Watchdog will detect and log failures but will not restart the service.

Less frequent health checks Increase CheckIntervalSec (up to 300) to reduce check frequency on constrained hardware. Decrease it (minimum 2) for faster failure detection.

Start Watchdog on every agent boot Replace the PolicyPreprocessingCompleted event trigger with Startup so the Watchdog service is started each time the agent boots, not only on policy sync.

Disable the Watchdog later Push a second ConfigurationUpdate policy that sets AutoRemediate to false in SettingsJson, and sets Action to Delete on the JobId to remove the startup job from the endpoint.

Last updated

Was this helpful?