Optimistic Concurrency

Chapter 5

•

Part II

•

min read

Optimistic Concurrency

The problem: two kitchen staff open the same order, both edit it (one marks an item out of stock, the other bumps the prep time), and the second save silently erases the first. The shape: carry a version with the order row, and make the write fail loudly when the version it read no longer matches.

This is Fowler's Optimistic Offline Lock (PoEAA). You don't hold a lock across the staff member's think-time, which would tie up the database waiting for a human mid-rush. You bet that conflicts are rare, check that bet at write time, and reject the loser. Postgres gives you a system column for it in xmin; a rowversion, an explicit integer, or an ETag work the same way.

UPDATE "order"
SET    status = @Status, version = version + 1
WHERE  id = @Id AND version = @ExpectedVersion;
-- rows affected = 0 → the other staff member won; reload and retry or surface the conflict

What it buys you in production: no lost updates when two staff edit one order, no database locks held across a slow client, and a clean conflict signal you can turn into a "this order changed since you opened it" message. It costs almost nothing to add and saves you from a class of bug that is invisible until a customer's order goes out with the wrong items.

Skip-if: an order is only ever written by one actor at a time, the way a single courier-assignment worker draining its own queue is. No concurrent writers, no lost update, no version column.

Evolutionary Schema Migrations

The problem: the menu and order schema has to change while the platform is live and taking orders, repeatably, across every environment, without a human running ad-hoc SQL against production. The shape: versioned, forward-only migration scripts, applied in order, tracked in a table the tool owns.

This is Fowler and Sadalage's evolutionary database design. Each change is a numbered script checked into the repo. A runner (DbUp-style, the same idea as Flyway) applies any script that hasn't run yet and records it. The schema's state is whatever the ordered scripts produce, and it's identical in every environment because the same scripts ran in the same order.

Migrations/
  0001_create_order.sql
  0002_add_tenant_id.sql
  0003_add_menu_modifier_groups.sql

Forward-only is the discipline that makes this safe. You don't write down-scripts you'll never test under load; you roll forward with a new migration that corrects the last one. To rename the order's status column without downtime you expand, migrate, then contract: add the new column, backfill it, ship code that writes both, then drop the old one in a later migration once nothing reads it.

What it buys you in production: a schema you can reproduce from an empty database to current state by running the folder, a clear audit of every change, and deploys that don't depend on someone remembering to run a script. This is the SQL-first stance held all the way down. The migrations are plain SQL you can read in a code review, not artifacts a framework generates from your model and resolves by magic. EF-style migrations, inferred from a changed C# class and prone to drift the moment two branches touch the model, are exactly the over-engineering tax the component chapter warned about.

Skip-if: there isn't one. If your schema changes after it ships, and it will, you need versioned migrations from the first deploy. The cheapest moment to adopt this is migration 0001.

Sharding and Partitioning

The problem: the order table has grown past what a single node serves well. The shape: split the data by a key, by range, hash or tenant, so each piece lives on its own and queries hit one piece instead of all of them (Azure Cloud Design Patterns). For a delivery marketplace the natural key is the city: an order is served, tracked and delivered within one metro, so orders shard cleanly by city.

Partitioning splits a table within one database; sharding spreads it across several. Both buy you headroom past a single node's limits. Both also cost you: cross-city queries (a national restaurant brand's daily totals) get expensive or impossible, transactions stop spanning the boundary cleanly, and your shard key becomes a decision you can barely change once orders are distributed by it.

What it buys you in production: scale past the point where one machine, well-indexed and with a read replica or two, runs out of room. That is a real ceiling, and the platforms that hit it are real. Sharding orders by city also keeps a busy metro's dinner rush off the same node as everyone else's.

Premature sharding is one of the classic taxes. Teams shard for a load they project rather than one they have, and inherit the operational weight of a distributed dataset to serve a table that would fit comfortably on one node for years. The order to exhaust first: index properly, then cache the hot reads, then add a read replica, then consider partitioning within one database. Sharding across nodes comes after all of that.

You are almost certainly not at the scale that needs this. The platforms that genuinely are know it from their metrics, not from a capacity-planning daydream.

Skip-if: a single well-tuned instance with proper indexes still has headroom. Measure the ceiling before you build for it. Most platforms never reach it.

Cache-Aside

The problem: the menu is read on every customer visit and changes a few times a day, yet you re-query it from the database thousands of times an hour. The shape: check the cache first; on a miss, load the menu from the database, populate the cache, and return (Azure Cloud Design Patterns).

This is the cheapest read-latency win on the altitude, and the menu is the textbook case for it: read-heavy, rarely written, the same Composite tree served to everyone browsing a restaurant. The application owns the logic, not a black box: look in the cache, fall through to the MenuGateway on a miss, write the result back with a time-to-live.

var cached = await _cache.GetAsync(menuKey);
if (cached is not null) return cached;

var menu = await _menuGateway.LoadAsync(restaurantId);   // miss: hit the source
await _cache.SetAsync(menuKey, menu, _ttl);
return menu;

What it buys you in production: a large cut in read latency and database load on the hottest read in the app, with very little code and no new persistence model. For a menu hammered at lunch and dinner it's often the biggest win you can ship in an afternoon.

The whole difficulty is invalidation. A cache is a second copy of the truth, and the moment a restaurant edits a price or marks an item sold out, your cached menu is stale until the TTL expires or you evict it. The pattern's quiet cost is reasoning about how wrong a read is allowed to be. A short TTL bounds staleness with minimal logic; explicit eviction when the restaurant saves the menu is tighter but more code and more ways to get it wrong. Pick the staleness window deliberately rather than discovering it through a customer charged the old price.

Skip-if: your reads are already fast enough, or the data changes so often that the cache would miss as much as it hits. Caching live courier GPS positions, which move every few seconds, is invalidation complexity you're paying for nothing.

Soft Delete and Temporal

The problem: "delete" almost never means "destroy." A restaurant removes a menu item but you still need it on last month's orders, or you need to know what an item cost on the day a customer was charged. The shape: mark rows deleted instead of removing them, and keep a history of changes rather than overwriting in place.

Soft delete is a flag, a deleted_at timestamp, and a default filter that hides marked rows, so a delisted dish disappears from the live menu but still resolves on the orders that bought it. Temporal goes further and keeps the full history of every version of a row, which is what SQL:2011 system-versioned temporal tables and Fowler's temporal patterns formalise. Menu prices are the case that earns it: when a customer disputes a charge, you need the price that was live at the moment they ordered, not today's.

-- Soft delete: delist a menu item, don't destroy it.
UPDATE menu_item SET deleted_at = now() WHERE id = @Id;

-- Reads exclude delisted items by default.
SELECT * FROM menu_item WHERE deleted_at IS NULL;

What it buys you in production: undo for an accidentally delisted item, a recovery path that doesn't involve a backup restore, and a price history that answers "what did this cost when the order was placed?" without a forensic dig. When the lighter event-sourcing payoff you want is just menu-price history, this gets you most of it without the rest of the machinery.

The cost is that nothing is ever really gone, so every query has to remember the filter (push it into the MenuGateway or a view so callers can't forget), tables grow without a purge policy, and "delete my account" requests under privacy law mean a real delete of customer data, not a flag. Soft delete is recoverability, not erasure. Keep the two distinct.

Skip-if: the data is genuinely disposable and nobody will ever ask for it back or audit it. A transient courier-location ping or a throwaway cache row doesn't need a tombstone.

Honorable Mentions

Three patterns missed the cut but are worth a line for the team that wants to go further.

Lightweight Unit of Work (Fowler, PoEAA) is a single transaction boundary spanning several gateway calls, so placing an order and decrementing its stock commit or roll back together. Use it as exactly that, a using scope around a transaction, and nothing more. The full ORM-style Unit of Work that tracks every changed object is the change-tracking magic the SQL-first stance rejects.

Read Replicas route read traffic to copies of the primary, scaling reads without the weight of sharding. They're the step you reach for before partitioning, and they pair naturally with the menu cache-aside. The catch is replication lag, so route reads that must be current (a customer's live order status) to the primary.

Connection Pooling reuses database connections instead of opening one per request, which is the difference between a service that holds steady under load and one that exhausts the database's connection limit. Your data library and driver give it to you; the honorable-mention work is tuning the pool size, not building it.

Persistence holds state still. Production also has to move it between services, reliably and without dropping it on the floor. That is the next altitude.

the-pareto-stack-cloud-design-patterns-for-small-teams

the-ladder-of-altitudes

how-to-read-this

object-level-the-patterns-that-earn-their-keep

decorator

state

component-level-structuring-one-service

ports-and-adapters-hexagonal

mediator-the-commandquery-split

data-persistence

optimistic-concurrency

messaging-scale

outbox

resilience-staying-up-when-dependencies-dont

rate-limiting-throttling

timeout-fallback

the-composed-pipeline

observability-diagnostics-seeing-inside-production

metrics-the-four-golden-signals

externalised-configuration

hosting-cloud-agnostic-by-default

sidecar-ambassador

orchestrator-agnostic-deploy

a-reference-service

the-relay-outbox-to-queue

the-payment-saga-charge-pay-out-compensate

the-over-engineering-tax

conclusion-production-ready-deliberately

the-pattern-quick-reference-card

altitude-3-data-persistence

altitude-5-resilience

the-skip-list

full-event-sourcing-for-crud

robert-c-martin-uncle-bob-the-house-authority-for-structure

altitude-2-component

altitude-4-messaging-scale

altitude-6-observability-diagnostics

Download the full PDF for free?

Free download — no account required

Get the PDF

Prev Next