Cephalon Runtime Failure Policy
Cephalon now treats runtime failure behavior as a first-class engine contract.
Configuration
Section titled “Configuration”{ "Engine": { "FailurePolicy": { "StartupFailureBehavior": "FailFast", "StopFailureBehavior": "BestEffortContinue", "AllowManualRestart": true, "MaxRestartAttempts": 3, "StartupReadinessDelay": "00:00:10", "ShutdownLivenessGracePeriod": "00:00:20", "ManualRestartBackoff": "00:00:30" } }}Startup behavior
Section titled “Startup behavior”FailFast- default
- startup exceptions are rethrown to the host
- the runtime still records failure context before the exception leaves the lifecycle call
CaptureOnly- startup exceptions are captured into runtime state
- the runtime moves to
Failed - hosts can stay alive and inspect
/engine/status
Startup readiness warmup
Section titled “Startup readiness warmup”StartupReadinessDelay- optional
- keeps
/health/readyand the readiness report in/engine/diagnosticsUnhealthyfor a bounded warmup window after startup succeeds - does not change liveness; the process is still considered live while warmup completes
- surfaces as
activeWindow = "startup-warmup"with an end timestamp in health payloads
Stop behavior
Section titled “Stop behavior”FailFast- first stop failure is rethrown
BestEffortContinue- default
- Cephalon keeps stopping the remaining started modules
- the runtime still records failure context and ends in
Failed
Shutdown liveness drain
Section titled “Shutdown liveness drain”ShutdownLivenessGracePeriod- optional
- keeps
/health/liveHealthywhile shutdown is in progress and the configured drain window has not expired - once the drain window expires, liveness becomes
Unhealthyuntil shutdown completes so hosts and orchestrators can see a stuck drain - surfaces as
activeWindow = "shutdown-drain"with the drain deadline in health payloads
Restart expectations
Section titled “Restart expectations”- manual restart is controlled by
AllowManualRestart MaxRestartAttemptslimits explicitRestartAsync(...)callsManualRestartBackoffdelays explicitRestartAsync(...)calls after restartablestartfailures- restart is intentionally conservative:
- a failed
startphase can be restarted - a failed
initializephase cannot be restarted safely and the runtime should be rebuilt - a failed
stopphase must be resolved before retrying restart
- a failed
- when backoff is configured,
RestartAsync(...)throws until the backoff window expires - restart backoff surfaces through
/engine/status,/engine/runtime-story,/health/live,/health/ready, and/engine/diagnostics
Runtime diagnostics
Section titled “Runtime diagnostics”/engine/status now includes:
- current runtime status
- lifecycle timestamps
- shutdown drain timing when a stop is in progress
- restart count
- last failure context:
- phase
- module id/version when available
- exception type and message
- whether restart is currently allowed
- when restart exits backoff, if a manual restart cooldown is active
/engine/failure-policy exposes the effective policy the runtime was built with.