Back to Blog
architecturegoapi-designdead-mans-switchengineering

Building a Dead Man's Switch API: Architecture Decisions

AK
Abel Kuruvilla
13 min read
Share on X
Building a Dead Man's Switch API: Architecture Decisions

Building a Dead Man's Switch API: Architecture Decisions

Building a dead man's switch requires solving a set of engineering problems that are unlike typical web application development. The system must be reliable over long time periods, handle sensitive cryptographic operations correctly, trigger actions based on the absence of activity rather than the presence of it, and maintain strict security guarantees around the most sensitive data its users will ever store.

This article explores the architecture decisions behind Burning Ash Protocol's Go API, explaining not just what was built but why each choice was made. If you are building a similar system or evaluating BAP's technical approach, this gives you the engineering context behind the codebase.

Why Go

The language choice for a dead man's switch API is consequential. The system runs continuously, handles background scheduling, performs cryptographic operations, and must be reliable for months or years without intervention.

Go was chosen for several reasons rooted in the nature of the problem:

Static typing and compilation catch errors before deployment. A dead man's switch that crashes due to a type error or nil pointer is a dead man's switch that fails to trigger. Go's compile-time checks eliminate entire categories of runtime errors.

Goroutines for concurrent scheduling. The scheduler that monitors liveness checks, processes transfers, and refreshes OAuth tokens runs alongside the HTTP server in the same process. Go's goroutines and channels make this concurrent architecture straightforward without the complexity of multi-threading.

Single binary deployment. Go compiles to a single static binary with no runtime dependencies. This simplifies Docker images, reduces the attack surface (no interpreter or runtime to exploit), and makes the deployment artifact trivially portable.

Standard library cryptography. Go's crypto package provides well-audited implementations of AES-GCM, Ed25519, and the randomness primitives needed for Shamir's Secret Sharing. Using standard library cryptography avoids the risk of third-party crypto library vulnerabilities.

Predictable performance. A dead man's switch does not need to be fast --- it needs to be predictably reliable. Go's garbage collector has low-latency characteristics, and the language's lack of runtime surprises (no just-in-time compilation, no dynamic dispatch overhead) makes performance predictable.

HTTP Layer: Chi Router

BAP uses the Chi router for HTTP routing. Chi was chosen over alternatives like Gin, Echo, or the standard library net/http mux for specific architectural reasons.

Middleware composability. Chi's middleware chaining model allows stacking security middleware (CORS, rate limiting, JWT authentication, security headers) in a declarative order. The middleware chain is defined in the router package and applies uniformly to all routes, reducing the risk of a handler accidentally bypassing security checks.

Standard net/http compatibility. Chi handlers are standard http.HandlerFunc types, not framework-specific handler signatures. This means every handler is testable with Go's standard httptest package without framework mocking. It also means any standard library middleware works with Chi without adaptation.

Route grouping for authorization boundaries. BAP's routes are grouped by authorization level: public routes (health check, registration, login), authenticated routes (will management, Survivor configuration), and admin routes (platform management, user administration). Chi's route groups map directly to these authorization boundaries, with the JWT middleware applied at the group level.

No magic. Chi does not use struct tags, code generation, or reflection for routing. Routes are explicitly registered functions, which makes the routing table auditable and debuggable.

Data Layer: GORM with Encryption at the Repository Level

BAP uses GORM as its ORM, with a critical architectural decision: encryption happens at the repository layer, not in the handlers or a separate service.

Why Encrypt at the Repository Level

When a handler calls ConnectorRepo.Create(connector), the repository encrypts sensitive fields (API keys, OAuth tokens, SMTP passwords) before writing to the database. When a handler calls ConnectorRepo.FindByID(id), the repository decrypts those fields before returning the model.

This design has two key benefits:

Handlers never see encrypted data. Business logic operates on plaintext values without awareness of the encryption layer. This separation prevents bugs where a handler accidentally stores plaintext or displays encrypted blobs.

Encryption cannot be bypassed. Because encryption is embedded in the data access layer, there is no code path that writes sensitive data without encrypting it or reads sensitive data without decrypting it. A new handler written by a contributor automatically gets encryption --- it is not opt-in.

The repository accepts a masterKey parameter at construction time. This key is provided once at application startup and used for all encryption/decryption operations within that repository instance.

Database Abstraction

BAP supports SQLite for development and single-user deployments, and PostgreSQL for production multi-user deployments. GORM's dialect abstraction handles the differences, but migrations are maintained separately for each database type in api/migrations/sqlite/ and api/migrations/postgres/.

Separate migration files exist because SQLite and PostgreSQL have different SQL dialects for schema changes (SQLite's ALTER TABLE support is notably limited). Maintaining separate migrations ensures each database gets correct, tested schema changes rather than lowest-common-denominator SQL.

Migrations run automatically at API startup using golang-migrate. The API will not accept HTTP requests until all pending migrations have been applied successfully. This prevents the application from serving requests against an outdated schema.

The Handler Struct Pattern

BAP uses a single Handler struct that holds all dependencies: repositories, configuration, connectors, and the entitlements service. All HTTP handlers are methods on this struct.

type Handler struct {
    Config          *config.Config
    HostRepo        *repository.HostRepo
    WillRepo        *repository.WillRepo
    SurvivorRepo    *repository.SurvivorRepo
    ConnectorRepo   *repository.ConnectorRepo
    StorageRepo     *repository.StorageRepo
    Entitlements    *entitlements.Service
    // ... additional fields
}

This pattern is sometimes called a "god object," and it is a deliberate tradeoff.

In favor: All dependencies are explicitly declared in one place. Wiring happens in main.go and is visible at a glance. There are no global variables, no service locators, and no dependency injection frameworks. Testing a handler requires constructing a Handler with mock repositories --- no framework magic.

Against: The struct grows as the application grows. A handler that only needs WillRepo still receives the entire Handler with all its dependencies.

For BAP's size and complexity, the explicit wiring in a single struct was judged to be simpler and more debuggable than a dependency injection framework. The struct is defined in handler/stubs.go, serving as the canonical list of all application dependencies.

Scheduler Design

The scheduler is the heart of the dead man's switch. It runs background jobs that monitor liveness check deadlines, trigger will transfers, refresh OAuth tokens, and sync storage permissions.

Polling Architecture

BAP's scheduler uses a polling-based design with a 30-second interval. Every 30 seconds, the scheduler checks:

  1. Liveness check deadlines: Are any hosts past their response window (HCRT) without checking in?
  2. Missed check-in escalation: Has any host exceeded the consecutive miss threshold (HCRAC)?
  3. Transfer deadlines: Are any active will transfers past their deadline?
  4. OAuth token refresh: Are any OAuth tokens approaching expiration?

The polling approach was chosen over event-driven alternatives (message queues, database triggers, cron-based scheduling) for several reasons.

Simplicity. The scheduler is a goroutine with a time.Ticker and a database query. There are no external dependencies (no Redis, no RabbitMQ, no separate cron daemon). The fewer moving parts in a system that must be reliable for years, the better.

Crash recovery. If the application restarts, the scheduler resumes polling from the current state. There are no in-flight events to recover, no queue messages to replay, and no at-least-once delivery semantics to reason about. The database is the single source of truth, and the scheduler reads it fresh every cycle.

Idempotency. The polling queries check current state, not events. If a liveness check is overdue, the scheduler detects it regardless of whether it was overdue last cycle too. The handlers are idempotent --- processing an already-processed state is a no-op. This eliminates an entire class of duplicate-processing bugs.

The 30-second interval is a tradeoff. It introduces up to 30 seconds of latency between a deadline passing and the scheduler detecting it. For a dead man's switch where response windows are measured in hours or days, 30 seconds is negligible. Reducing the interval to 1 second would increase database load without meaningful benefit.

Liveness Check State Machine

A host's liveness state follows a simple state machine:

  1. Active: The host is within their check-in interval. No action needed.
  2. Pending: A liveness check has been sent and the response window (HCRT) is open. Waiting for the host to respond.
  3. Missed: The response window has closed without a response. The consecutive miss counter increments.
  4. Triggered: The consecutive miss counter has reached HCRAC. The Will Transfer Protocol begins.

The scheduler evaluates all hosts against this state machine every polling cycle. Transitions are recorded in the database with timestamps for auditability.

Running In-Process

The scheduler shares the same database connection and repository instances as the HTTP server. It runs as a goroutine started in main.go after the database and repositories are initialized but before the HTTP server starts accepting requests.

This in-process design means:

  • No separate deployment artifact for the scheduler
  • No inter-process communication protocol
  • No message serialization/deserialization
  • Shared connection pool to the database
  • Consistent encryption key access (same master key)

The tradeoff is that the scheduler and HTTP server share a failure domain. If the process crashes, both stop. For a single-instance deployment (which is BAP's self-hosted model), this is acceptable --- if the process is down, both the API and the scheduler need to recover anyway.

Encryption Layer Architecture

BAP's crypto package provides three cryptographic primitives: AES-256-GCM encryption, Shamir's Secret Sharing, and Ed25519 signing.

AES-256-GCM Implementation

The encryption functions use Go's crypto/aes and crypto/cipher packages. Each encryption operation:

  1. Generates a 96-bit random nonce using crypto/rand
  2. Creates an AES cipher block from the 256-bit key
  3. Wraps the block cipher in GCM mode
  4. Encrypts the plaintext with the nonce, producing ciphertext and an authentication tag
  5. Returns nonce || ciphertext || tag as a single byte slice

Decryption reverses the process: extract the nonce from the first 12 bytes, decrypt and authenticate the remainder.

The nonce is randomly generated rather than counter-based. Random nonces are safe for up to approximately 2^32 operations with the same key (the birthday bound for 96-bit nonces). Since each will has its own DEK and a single will is unlikely to be re-encrypted billions of times, random nonces are well within safe limits and simpler to implement correctly than counter management.

Shamir's Secret Sharing

The Shamir implementation operates over a finite field (GF(256) for byte-level splitting). Given a secret (the DEK), a total share count N, and a threshold K:

  1. Construct a random polynomial of degree K-1 with the secret as the constant term
  2. Evaluate the polynomial at N distinct points
  3. Each (point, value) pair is a share

Reconstruction uses Lagrange interpolation over any K shares to recover the polynomial's constant term (the secret).

The implementation is constant-time for the field arithmetic operations to prevent timing side-channel attacks on the share reconstruction process.

Key Hierarchy

The encryption layer implements a two-level key hierarchy:

  • Master Key (environment variable) encrypts DEKs
  • DEKs (per-will, stored encrypted in the database) encrypt will content

This hierarchy enables key rotation at the master level without re-encrypting all will content. Rotating the master key requires decrypting all DEKs with the old master key and re-encrypting them with the new one --- a database migration that does not touch the will content itself.

Deploy Mode Branching

BAP supports two deployment modes: selfhosted (single-tenant, no billing) and saas (multi-tenant, Stripe billing, platform connectors). The deploymode package exposes IsSaaS() and IsSelfHosted() functions that are checked throughout the codebase.

This is a global state pattern, which is generally avoided in Go applications. The decision to use it was pragmatic:

The deploy mode is set once at startup and never changes. It is effectively a compile-time constant that happens to be set at runtime via environment variable. There is no concurrency concern because the value is written once before any goroutines read it.

The alternative is dependency injection of mode-aware services throughout the application. This would require every handler and service to accept a mode parameter or a mode-aware interface, adding complexity proportional to the number of mode-dependent features. The global function is simpler and equally testable (tests set the mode before running).

The entitlements service centralizes mode-dependent business logic. Rather than scattering if IsSaaS() checks throughout handlers, the entitlements.Service encapsulates plan and capability checks. Handlers call entitlements.CanUsePlatformConnectors(host) rather than directly checking the deploy mode. This limits the blast radius of mode-related changes.

Configuration Management

BAP uses kelseyhightower/envconfig for configuration, loading all settings from environment variables into a singleton config struct at startup. The config is validated before the application starts serving requests.

Environment variables were chosen over configuration files for deployment simplicity. Docker Compose, Kubernetes, and systemd all have native support for environment variable injection. There is no configuration file to mount, template, or keep in sync with the deployment environment.

The config struct is loaded once and passed explicitly to the components that need it. There is no global config variable --- the Handler struct receives the config at construction time, and the scheduler receives it when started.

Error Handling Philosophy

BAP's error handling follows Go conventions with a specific bias: errors at the HTTP boundary are translated to user-safe messages, while errors at the internal boundary are logged with full context.

Handlers return structured JSON error responses with an error code and a human-readable message. The message never includes internal details (stack traces, database errors, file paths). Internal errors are logged using structured logging (zap) with request context (request ID, user ID, endpoint) for debugging.

This separation prevents information leakage through error messages --- a security concern for a system storing sensitive estate data.

What Would Be Done Differently

No architecture survives contact with production unchanged. A few areas where the current design has known limitations:

The Handler struct will eventually need splitting. As the application grows, grouping all handlers under one struct becomes unwieldy. A natural split point would be by domain: will handlers, Survivor handlers, connector handlers, admin handlers, each with their own dependency struct.

The scheduler could benefit from distributed locking. The current design assumes a single instance. If BAP ever supports horizontal scaling (multiple API instances), the scheduler would need a distributed lock (using the database or Redis) to prevent duplicate processing.

Migration management across two databases is maintenance overhead. Every schema change requires two migration files. A migration testing framework that validates both SQLite and PostgreSQL migrations against the same test suite would reduce the risk of divergence.

These are known tradeoffs, not oversights. The current architecture optimizes for simplicity and single-instance reliability, which aligns with BAP's primary deployment model: a single self-hosted instance managing one person's digital estate.

Conclusion

Building a dead man's switch API requires engineering decisions that prioritize long-term reliability over development velocity. Go's static typing, standard library cryptography, and goroutine model provide a foundation that minimizes runtime surprises. The Chi router and GORM ORM offer just enough abstraction without obscuring the underlying behavior. The polling-based scheduler trades sub-second responsiveness for crash recovery simplicity. And the encryption layer, embedded at the repository level, ensures that sensitive data is never accidentally stored in plaintext.

Every architecture decision was made in service of the system's core requirement: when the switch triggers, it must work. There are no second chances.

Related Articles