Outbox

Chapter 6

•

Part II

•

min read

Outbox

Here is the hole the persistence chapter left open. You commit the Order row to the database, then publish OrderPlaced to the broker. Two systems, two network calls, no shared transaction. If the process dies between them, you've either saved the order without telling the kitchen, or announced an order that was never saved. Dual writes have no safe ordering.

The Transactional Outbox closes it (Chris Richardson, microservices.io). Write the OrderPlaced message into an outbox table in the same database transaction as the order itself. A separate relay reads unpublished rows and pushes them to the broker, marking each as sent. The DB commit is now the single source of truth: if the order row is there, OrderPlaced will go out; if the transaction rolled back, there's nothing to send.

-- place_order: both writes commit together, atomic by construction.
CREATE OR REPLACE PROCEDURE place_order(
    p_id uuid, p_tenant uuid, p_total bigint,
    p_msg_id uuid, p_payload text)
LANGUAGE sql AS $$
    INSERT INTO orders (id, tenant_id, total)
    VALUES (p_id, p_tenant, p_total);

    INSERT INTO outbox (id, topic, payload, created_utc)
    VALUES (p_msg_id, 'order.placed', p_payload::jsonb, now());
$$;

// The gateway calls the proc through Dapper: one transaction, no ORM.
await _db.ExecuteAsync("place_order",
    new { id, tenant, total, msgId, payload },
    commandType: CommandType.StoredProcedure);

// Relay: drain unsent OrderPlaced rows, publish, mark sent. Runs on its own loop.
foreach (var row in await _outbox.ReadUnsentAsync(batch: 100))
{
    await _bus.PublishAsync(row.Topic, row.Payload);
    await _outbox.MarkSentAsync(row.Id);
}

What this buys you in production: at-least-once delivery of OrderPlaced you can actually trust, without a distributed transaction across the DB and the broker. The relay can crash and resume; it just re-sends anything it didn't get to mark. That re-send is exactly why the kitchen and courier consumers had to be idempotent. The outbox is the same stored-procedure write the persistence chapter (Ch 5) built, with one extra INSERT riding inside the transaction.

The skip-if: if you aren't publishing as a result of a database write, you don't have the dual-write problem, so you don't need an outbox. A "courier moved" GPS ping isn't tied to a transactional state change; publish it directly. The outbox earns its keep only when a state change and a message have to happen together or not at all, as OrderPlaced does.

Backpressure / Dead-Letter Queue

An order whose payment keeps failing, or one the matcher can't assign because no courier is online, will be retried forever, blocking the queue behind it and burning CPU on a failure that will never succeed this minute. Two patterns keep a struggling system honest instead of letting it thrash or silently drop an order.

Backpressure is the upstream signal: when the assignment queue is deep or the workers are saturated, slow intake down rather than buffer orders without limit (reactive-streams). A bounded queue that rejects or pauses is telling you the truth about capacity, which at the height of the dinner rush is exactly when you need the truth. An unbounded one just moves the moment you fall over to later, and makes it worse.

A Dead-Letter Queue is the downstream escape hatch (EIP Dead Letter Channel). After a ProcessPayment or AssignCourier message fails N times, the broker moves it to a side queue instead of redelivering it forever. The main queue keeps flowing for the orders that can be processed; the stuck order waits somewhere you can inspect it, alert on it, and replay it once the card is re-authorised or a courier comes online.

// Most brokers do this for you: set max-delivery-attempts + a DLQ target.
// In handler code, the contract is simple — assign the order, or let it throw.
try { await _matcher.Assign(msg.Order); await msg.AckAsync(); }
catch (Exception ex)
{
    _log.Error(ex, "Assignment failed for {OrderId}", msg.Order.Id);
    await msg.NackAsync();   // broker counts the attempt, DLQs at the limit
}

What this buys you in production: failure becomes visible and bounded. One un-assignable order doesn't wedge the whole rush, and you never silently lose an order; it lands in the DLQ with its history, waiting for a human or a replay. An alert on DLQ depth is one of the highest-signal alarms you can wire up, and "orders we couldn't deliver" is a number the business will want anyway.

The skip-if: none, really. If you run consumers, configure a dead-letter target. The skip is in the backpressure tuning, not the DLQ: don't hand-build elaborate flow-control before you've measured an actual saturation problem. Start with a bounded queue and a DLQ, then tune.

Never silently drop an order. One you can't process belongs in a queue you can see, not a log line nobody reads.

Saga / Process Manager

This is the heaviest pattern in the chapter, and the one most teams reach for too early. Fulfilling an order spans several services, each with its own database, and you need them to agree on the outcome (Garcia-Molina & Salem, Sagas, 1987; Chris Richardson, microservices.io). Charge the customer, assign a courier, confirm with the restaurant: three services, three databases, no distributed transaction to roll them all back. So a Saga sequences the steps and, when one fails, runs a compensating action to undo the ones that already succeeded.

A Process Manager is the orchestrated form: a single component that holds the state of the in-flight order, reacts to each step's outcome, and decides the next move or the rollback. It walks the order through payment, then courier assignment, then restaurant confirmation. If the restaurant won't accept the order at the end of that, the manager refunds the customer and releases the courier it already booked. Every forward step needs a defined way back.

// Order-fulfilment process manager: react to each event, advance or compensate.
public async Task On(PaymentTaken e)               // charged → now assign
{
    var assigned = await _matcher.TryAssign(e.OrderId);   // CourierMatchingStrategy
    if (assigned) await _bus.PublishAsync(new CourierAssigned(e.OrderId));
    else          await _bus.PublishAsync(new RefundRequested(e.OrderId)); // compensate
}

What this buys you in production: eventual consistency across payment, courier-matching, and the restaurant without a two-phase commit you can't get anyway. The order reaches a defined end state (delivered, or fully refunded and released) even when a step fails halfway, and the saga's state tells you exactly where any stuck order is sitting.

The skip-if, and it's a big one: you need an actual multi-step distributed transaction before any of this earns its cost. If charge, assign, and confirm all lived in one database, a single local transaction would do the same job with none of the moving parts. Sagas bring orchestration code, a compensating action for every step (refund the charge, release the courier, cancel the restaurant ticket), and a whole new failure mode: the compensation itself failing. Most teams that build a saga were one well-placed BEGIN TRANSACTION away from not needing it. Reach for it when an order genuinely spans services; until then, the local transaction is simpler and you should keep it.

Claim-Check

Some order messages carry a large payload (a generated PDF receipt, a "proof of delivery" photo from the courier, a full itemised invoice), and big messages clog the broker, blow past message-size limits, and make every consumer pay to move bytes most of them ignore (EIP; Azure Cloud Design Patterns). Store the payload in object storage and put only a reference on the bus.

// Producer: stash the receipt PDF, send the token.
var uri = await _storage.UploadAsync(bucket: "receipts", receiptPdf);
await _bus.PublishAsync(new ReceiptReady(orderId, uri));   // small message

// Consumer: fetch only when it actually needs the bytes.
var bytes = await _storage.DownloadAsync(msg.Uri);

What this buys you in production: the bus stays fast and cheap, because it moves order IDs and a storage URI, not megabytes of PDF. You sidestep broker size limits without splitting messages by hand, and the analytics consumer that only needs the order total never downloads the receipt. Object storage (GCP Cloud Storage; AWS S3, Azure Blob Storage) is already the right home for large, immutable payloads like a receipt or a delivery photo.

The skip-if: if your messages are small (an order ID, a status, a courier location), the indirection is pure overhead. A claim-check adds a storage write, a storage read, and a lifecycle question (when does the old receipt get cleaned up?). Reach for it when payload size is an actual problem, not by default. OrderPlaced is a handful of fields; it does not need a claim-check.

Honorable Mentions

Three patterns that just missed the cut, for when you need them.

Idempotent Consumer is arguably the most important pattern in this chapter, and it lives in the next one. At-least-once delivery means every consumer will eventually see a duplicate OrderPlaced, and a kitchen handler that cooks the same order twice corrupts state. The fix (dedup on the order_id, or design the operation to be naturally repeatable) is the Idempotency pattern in Resilience, where it belongs alongside the retries that make duplicates inevitable.

Event-Carried State Transfer (Fowler) puts enough state inside the event that subscribers don't have to call back to the source to act on it. Fattening OrderPlaced with the line items and delivery address means the kitchen consumer never calls the order service back to render the ticket. It cuts coupling and read load, at the cost of fatter messages and data that can be stale by the time it's read. Useful when a chatty callback pattern is hurting you.

Priority Queue (Azure Cloud Design Patterns) lets urgent work jump ahead of routine work when one queue serves both. A "courier waiting at the counter" re-assignment should beat a fresh order in the assignment queue. Most teams approximate it with two queues and more workers on the fast one, which is simpler and usually enough.

Moving an order across a network means the payment provider, the courier service, and the kitchen now fail independently of each other. Surviving that, without taking the rest of the marketplace down with the broken part, is the next altitude.

the-pareto-stack-cloud-design-patterns-for-small-teams

the-ladder-of-altitudes

how-to-read-this

object-level-the-patterns-that-earn-their-keep

decorator

state

component-level-structuring-one-service

ports-and-adapters-hexagonal

mediator-the-commandquery-split

data-persistence

optimistic-concurrency

messaging-scale

outbox

resilience-staying-up-when-dependencies-dont

rate-limiting-throttling

timeout-fallback

the-composed-pipeline

observability-diagnostics-seeing-inside-production

metrics-the-four-golden-signals

externalised-configuration

hosting-cloud-agnostic-by-default

sidecar-ambassador

orchestrator-agnostic-deploy

a-reference-service

the-relay-outbox-to-queue

the-payment-saga-charge-pay-out-compensate

the-over-engineering-tax

conclusion-production-ready-deliberately

the-pattern-quick-reference-card

altitude-3-data-persistence

altitude-5-resilience

the-skip-list

full-event-sourcing-for-crud

robert-c-martin-uncle-bob-the-house-authority-for-structure

altitude-2-component

altitude-4-messaging-scale

altitude-6-observability-diagnostics

Download the full PDF for free?

Free download — no account required

Get the PDF

Prev Next