Placing an order is not one database write. It is a distributed transaction across the payment provider, the restaurant's payout account, and the courier's. No single commit spans them, so coordination falls to a Saga (Garcia-Molina & Salem; Richardson). The Saga charges the customer, pays out the restaurant, pays out the courier, and on any failure runs the compensations in reverse: refund the customer, claw back a payout. A process manager tracks which step the order is on so a crash resumes rather than restarts.
public async Task Run(Guid orderId, CancellationToken ct)
{
var charge = await _payments.Charge(orderId, ct); // step 1
try
{
await _payouts.PayRestaurant(orderId, ct); // step 2
await _payouts.PayCourier(orderId, ct); // step 3
}
catch
{
await _payments.Refund(charge.Id, ct); // compensate step 1
throw;
}
}The charge call is the riskiest hop in the whole service, so it runs through a Polly resilience pipeline: a retry with exponential backoff and jitter (Brooker, AWS) wrapped by a circuit breaker (Nygard, Release It!) and a timeout. The retry rides out a payment-gateway hiccup; the breaker stops hammering a provider that is genuinely down; the timeout keeps a slow provider from holding an order hostage.
_pipeline = new ResiliencePipelineBuilder()
.AddRetry(new RetryStrategyOptions
{
MaxRetryAttempts = 3,
BackoffType = DelayBackoffType.Exponential,
UseJitter = true
})
.AddCircuitBreaker(new CircuitBreakerStrategyOptions
{
FailureRatio = 0.5,
BreakDuration = TimeSpan.FromSeconds(30)
})
.AddTimeout(TimeSpan.FromSeconds(10))
.Build();Retries are only safe because the charge is idempotent, keyed on the order id. A retry that lands after the provider already took the money returns the original charge instead of taking it twice. Without that key, every backoff is a chance to double-charge a customer for one dinner.
public async Task<Charge> Charge(Guid orderId, CancellationToken ct) =>
await _pipeline.ExecuteAsync(t =>
_provider.Charge(orderId, idempotencyKey: orderId.ToString(), t), ct);A retry without an idempotency key is a coin-flip on whether the customer pays once or twice. Key the charge, or don't retry it.
The customer does not wait on any of this synchronously. The app polls GET /orders/{id} and reads status from a projection: a live read model the workers keep current as they advance the order's state. The place-order path writes; the read path reads its own shape, the one the tracking screen wants. That separation is CQRS (Greg Young; Fowler's bliki), and the order-history and restaurant live-board views are further Materialized Views (Fowler; Azure Cloud Design Patterns) over the same event stream.
public sealed class GetOrderStatusHandler(IOrderReadModel reads)
: IRequestHandler<GetOrderStatus, OrderStatusDto?>
{
public Task<OrderStatusDto?> Handle(GetOrderStatus q, CancellationToken ct) =>
reads.StatusById(q.OrderId, ct); // a thin projection query, tenant-scoped by RLS
}The same Row-Level Security policy that protected the write protects this read. A customer reads only their own order; a restaurant's live-board reads only its own tenant's orders, and no developer had to remember to add WHERE tenant_id = @id. The isolation is structural.
None of the above is operable unless you can see it. The observability altitude threads through every component at once. A health endpoint (Health Endpoint Monitoring, Azure) tells the orchestrator whether to send traffic and whether to restart. Liveness says the process is alive; readiness says it can reach its database and broker.
builder.Services.AddHealthChecks()
.AddNpgSql(cfg.GetConnectionString("Orders")!, name: "db")
.AddCheck<BrokerHealthCheck>("broker");
app.MapHealthChecks("/health/live", new() { Predicate = _ => false });
app.MapHealthChecks("/health/ready", new() { Predicate = c => c.Tags.Contains("ready") });Distributed tracing (OpenTelemetry) follows a single order from the HTTP place, through the queue, into the kitchen and courier workers, through the payment Saga, by carrying a correlation id across the broker boundary. Without it, a charge that failed in the Saga is unconnectable to the order that triggered it. With it, one trace tells the whole story: placed, confirmed, charged, assigned, delivered. The metrics are the four golden signals (Google SRE): latency, traffic, errors, saturation. Queue depth is the saturation signal that matters most here, because a queue that only grows during a rush is a worker fleet that cannot keep up.
builder.Services.AddOpenTelemetry()
.WithTracing(t => t.AddAspNetCoreInstrumentation().AddNpgsql().AddOtlpExporter())
.WithMetrics(m => m.AddMeter("orders.worker").AddOtlpExporter());
// in the worker: the saturation signal that actually predicts a backed-up kitchen
_queueDepth = meter.CreateObservableGauge("orders.queue.depth", () => _broker.ApproximateDepth());Structured logs (Serilog) carry the same correlation id and the tenant_id on every line, keyed on order_id, so a support question about one customer's missing dinner is a filter, not an archaeology dig. The trail records references and decisions, not payloads, keeping it useful without turning the log into a copy of the orders table.
The whole thing ships as a container (the 12-Factor App). State lives outside the image, in the database and the broker, so any instance can serve any order and the orchestrator can kill and restart instances freely. Because the workers hold nothing local, the platform can scale them to zero between meal rushes and back up when the queue fills (Cloud Run; AWS App Runner, Azure Container Apps). The analytics worker, idle most of the afternoon, scales all the way down.
FROM mcr.microsoft.com/dotnet/aspnet:9.0 AS base
WORKDIR /app
COPY --from=build /publish .
ENV ASPNETCORE_URLS=http://+:8080
EXPOSE 8080
ENTRYPOINT ["dotnet", "OrderService.dll"]Scale-to-zero only works if the container shuts down cleanly. On SIGTERM the worker stops pulling new messages, lets the in-flight order finish its current step, and exits, so the platform never kills work mid-charge. The host's graceful-shutdown hook does exactly that, and the IHostedService honouring the cancellation token is what makes it safe.
public async Task StopAsync(CancellationToken ct)
{
_accepting = false; // stop pulling new orders
await _inFlight.WhenAllOrTimeout(ct); // let the current step finish, then exit
}The same image runs on any of the three clouds because nothing in it is tied to one. The broker, the database, and the secrets come from configuration, not from compile-time choices. That is the cloud-agnostic stance made concrete. You build one artifact and point it at whichever target you are deploying to, with no rewrite in between.
Step back and the whole stack is visible in one service. Dependency Injection (Fowler's Inversion of Control article; the "D" of Robert C. Martin's SOLID) and the Mediator wire the inside, while the OrderGateway and RLS hold the data. The Outbox, Pub/Sub, and Competing Consumers move the work, the payment Saga coordinates the money, and Polly keeps the charge safe when the provider wobbles. Health checks, traces, and golden-signal metrics make all of it observable, and a stateless container that scales to zero hosts the lot. A Strategy decides which courier gets the order. A State machine runs that order's life from placed to delivered, against the menu a Composite priced in the first place.
A whole vocabulary of patterns, and one slice of a food-delivery app uses maybe fifteen of them well. That ratio is the book.
Notice what is still missing. Full Event Sourcing never appears, because a projection over explicit state transitions is enough for live tracking; the event log stays a convenience here, not the system of record. Sharding is absent too, since one database holds these orders comfortably until the marketplace spans cities. You will also find no second tenant-isolation tier, no service mesh, and no Claim-Check for the small receipts these orders carry. The Saga earned its place because there genuinely is a distributed transaction, money moving across three parties. The heavier patterns did not, because the order does not yet need them. Every one of those absences was a deliberate skip, and the service is more maintainable for it.
The temptation now is to use all of it. Resist; the next chapter is about the tax you pay when you don't.
Download the full PDF for free?
Free download — no account required