Home

/

The Production-Ready Playbook

/

Altitude 5: Resilience

Altitude 5: Resilience

Appendix A
Appendix
4
min read

Altitude 5 — Resilience

Every network call is a coin-flip that sometimes lands wrong. These turn a dependency hiccup into a non-event. Library: Polly, without locking the concepts to it.

PatternWhat it solvesThe shapeSkip-if
Retry (backoff + jitter)Ride out a transient failureRe-issue with exponential backoff and jitter; pair with IdempotencyThe op isn't safe to repeat and can't be made idempotent
Circuit BreakerStop hammering a failing dependencyTrip open after a failure threshold, fail fast, probe before closingA single in-process call
Rate Limiting / ThrottlingCap request rate, in and outA limiter that sheds or queues traffic above a ceilingTrusted, low-volume internal traffic
BulkheadStop one slow dependency draining everythingIsolated resource pools per dependencyOne dependency, low concurrency
Timeout + FallbackNever wait forever; degrade instead of erroringA bounded wait with a sensible degraded answer on expiryNo sensible fallback: then fail fast and loud
IdempotencyMake retries and at-least-once delivery safeThe same request applied twice has one effect, keyed by an idempotency tokenNaturally idempotent reads
Steady StateStop a 3am failure from something filling upBound every growing resource: rotate logs, purge old data, cap caches and poolsNothing in the process grows unbounded (rare: check first)

Honorable mentions: Graceful Degradation, Load Shedding, Failover/Redundancy.

Altitude 6 — Observability & Diagnostics

You can't operate what you can't see, and seeing is worthless without something that wakes someone. Vendor-neutral: OpenTelemetry plus Serilog.

PatternWhat it solvesThe shapeSkip-if
Health Endpoint MonitoringLet an orchestrator restart or drain bad instancesLiveness and readiness endpoints it can probeNothing's orchestrating it
Structured LoggingMake "what happened at 2am" answerableMachine-parseable logs with context (Serilog), not string soupNever
Metrics (Four Golden Signals)See degradation before customers doLatency, traffic, errors, saturation: start with four, not fiftyNever
Distributed Tracing + Correlation IDsRoot-cause across a call graph in minutesOne request followed across services via a propagated trace context (OpenTelemetry)A single process, no fan-out
Externalised ConfigurationRun the same image in every environmentConfig read from the environment, not baked into the imageNever
Alerting & SLOsPage a human before the customer noticesThresholds tied to objectives, not raw noiseNo on-call or SLA yet (but wire it before you have users)
Audit LoggingKeep a defensible who-did-what trailAn immutable log of actions, references not PII, separate from diagnosticsNo regulatory or security need, no sensitive actions

Honorable mentions: Log Aggregation, Synthetic Monitoring, the RED/USE methods.

Altitude 7 — Hosting (cloud-agnostic, container-first)

The container is the portability boundary; state lives outside it. The same image runs on GCP, AWS or Azure. 12-factor is the backbone.

PatternWhat it solvesThe shapeSkip-if
Container as the Unit (12-factor)Run the same artifact everywhereOne OCI image as the deployable, portable boundaryA managed-runtime function where a container adds nothing
Stateless + Externalised StateScale out and restart freelyNo state in process memory; instances are cattle, not petsA genuinely single-instance tool
Sidecar / AmbassadorAdd cross-cutting infra without app changesA helper container (proxy, agent) co-deployed beside the appThe helper's job is a library call away
Scale-to-Zero (+ graceful shutdown)Pay for use, lose no in-flight workIdle instances drop to zero; SIGTERM drains work on reclaimLatency-critical, always-warm workloads
Orchestrator-Agnostic DeployAvoid cloud lock-inThe same OCI image to Cloud Run, ECS, Container Apps or k8sYou've deliberately committed to one platform's primitives
Infrastructure as CodeReproduce environments without a platform teamVersion-controlled, reviewable infra definitionsA single hand-clicked environment you'll never rebuild (usually a false economy)
Blue-Green / Canary DeployRelease with zero downtime and instant rollbackTraffic shifted to a parallel or partial slice, rolled back fast on troubleA low-traffic internal tool where a few seconds of downtime is fine

Honorable mentions: Feature Flags, Gateway/Backend-for-Frontend, Secrets Management, Service Discovery.

One vocabulary, seven rungs. The skip-if column is the half of the card most teams need most.

Next: Appendix B — The Skip List, for the patterns deliberately left off the ladder and the reason each is a tax a small team rarely needs.

the-pareto-stack-cloud-design-patterns-for-small-teams
the-ladder-of-altitudes
how-to-read-this
object-level-the-patterns-that-earn-their-keep
decorator
state
component-level-structuring-one-service
ports-and-adapters-hexagonal
mediator-the-commandquery-split
data-persistence
optimistic-concurrency
messaging-scale
outbox
resilience-staying-up-when-dependencies-dont
rate-limiting-throttling
timeout-fallback
the-composed-pipeline
observability-diagnostics-seeing-inside-production
metrics-the-four-golden-signals
externalised-configuration
hosting-cloud-agnostic-by-default
sidecar-ambassador
orchestrator-agnostic-deploy
a-reference-service
the-relay-outbox-to-queue
the-payment-saga-charge-pay-out-compensate
the-over-engineering-tax
conclusion-production-ready-deliberately
the-pattern-quick-reference-card
altitude-3-data-persistence
altitude-5-resilience
the-skip-list
full-event-sourcing-for-crud
robert-c-martin-uncle-bob-the-house-authority-for-structure
altitude-2-component
altitude-4-messaging-scale
altitude-6-observability-diagnostics

Download the full PDF for free?

Free download — no account required

Get the PDF
Get the PDF
Related Chapters
Free Download
Get the full PDF
All pages, including all code examples, diagrams, and the appendix reference card.
No spam. Unsubscribe at any time.
Your email won't be shared.
Oops! There's a problem with your request. We're working on fixing it. Please try again later.