Every network call is a coin-flip that sometimes lands wrong. These turn a dependency hiccup into a non-event. Library: Polly, without locking the concepts to it.
| Pattern | What it solves | The shape | Skip-if |
|---|---|---|---|
| Retry (backoff + jitter) | Ride out a transient failure | Re-issue with exponential backoff and jitter; pair with Idempotency | The op isn't safe to repeat and can't be made idempotent |
| Circuit Breaker | Stop hammering a failing dependency | Trip open after a failure threshold, fail fast, probe before closing | A single in-process call |
| Rate Limiting / Throttling | Cap request rate, in and out | A limiter that sheds or queues traffic above a ceiling | Trusted, low-volume internal traffic |
| Bulkhead | Stop one slow dependency draining everything | Isolated resource pools per dependency | One dependency, low concurrency |
| Timeout + Fallback | Never wait forever; degrade instead of erroring | A bounded wait with a sensible degraded answer on expiry | No sensible fallback: then fail fast and loud |
| Idempotency | Make retries and at-least-once delivery safe | The same request applied twice has one effect, keyed by an idempotency token | Naturally idempotent reads |
| Steady State | Stop a 3am failure from something filling up | Bound every growing resource: rotate logs, purge old data, cap caches and pools | Nothing in the process grows unbounded (rare: check first) |
Honorable mentions: Graceful Degradation, Load Shedding, Failover/Redundancy.
You can't operate what you can't see, and seeing is worthless without something that wakes someone. Vendor-neutral: OpenTelemetry plus Serilog.
| Pattern | What it solves | The shape | Skip-if |
|---|---|---|---|
| Health Endpoint Monitoring | Let an orchestrator restart or drain bad instances | Liveness and readiness endpoints it can probe | Nothing's orchestrating it |
| Structured Logging | Make "what happened at 2am" answerable | Machine-parseable logs with context (Serilog), not string soup | Never |
| Metrics (Four Golden Signals) | See degradation before customers do | Latency, traffic, errors, saturation: start with four, not fifty | Never |
| Distributed Tracing + Correlation IDs | Root-cause across a call graph in minutes | One request followed across services via a propagated trace context (OpenTelemetry) | A single process, no fan-out |
| Externalised Configuration | Run the same image in every environment | Config read from the environment, not baked into the image | Never |
| Alerting & SLOs | Page a human before the customer notices | Thresholds tied to objectives, not raw noise | No on-call or SLA yet (but wire it before you have users) |
| Audit Logging | Keep a defensible who-did-what trail | An immutable log of actions, references not PII, separate from diagnostics | No regulatory or security need, no sensitive actions |
Honorable mentions: Log Aggregation, Synthetic Monitoring, the RED/USE methods.
The container is the portability boundary; state lives outside it. The same image runs on GCP, AWS or Azure. 12-factor is the backbone.
| Pattern | What it solves | The shape | Skip-if |
|---|---|---|---|
| Container as the Unit (12-factor) | Run the same artifact everywhere | One OCI image as the deployable, portable boundary | A managed-runtime function where a container adds nothing |
| Stateless + Externalised State | Scale out and restart freely | No state in process memory; instances are cattle, not pets | A genuinely single-instance tool |
| Sidecar / Ambassador | Add cross-cutting infra without app changes | A helper container (proxy, agent) co-deployed beside the app | The helper's job is a library call away |
| Scale-to-Zero (+ graceful shutdown) | Pay for use, lose no in-flight work | Idle instances drop to zero; SIGTERM drains work on reclaim | Latency-critical, always-warm workloads |
| Orchestrator-Agnostic Deploy | Avoid cloud lock-in | The same OCI image to Cloud Run, ECS, Container Apps or k8s | You've deliberately committed to one platform's primitives |
| Infrastructure as Code | Reproduce environments without a platform team | Version-controlled, reviewable infra definitions | A single hand-clicked environment you'll never rebuild (usually a false economy) |
| Blue-Green / Canary Deploy | Release with zero downtime and instant rollback | Traffic shifted to a parallel or partial slice, rolled back fast on trouble | A low-traffic internal tool where a few seconds of downtime is fine |
Honorable mentions: Feature Flags, Gateway/Backend-for-Frontend, Secrets Management, Service Discovery.
One vocabulary, seven rungs. The skip-if column is the half of the card most teams need most.
Next: Appendix B — The Skip List, for the patterns deliberately left off the ladder and the reason each is a tax a small team rarely needs.
Download the full PDF for free?
Free download — no account required