Exceptions & Reliability

Chapter 10

•

Part III

•

min read

The chain of coin-flips

Here's a number that should worry you more than any leaked secret. Suppose each step your agent takes is 95% reliable: a model call that returns sensible JSON, an ATS read that comes back with the field you expected, a write that lands. Ninety-five per cent. You'd sign for that.

Now chain ten of them. The maths is unforgiving: 0.95 to the power of ten is about 60%. Twenty steps and you're at 36%. A multi-step agent isn't ninety-five per cent reliable. It's a chain of coin-flips, and the chain decides.

Reliability doesn't average across steps. It multiplies. Every step you add is a tax on the whole.

This is why the recruitment agents in this build, like the resume-screening loop, were built shallow: a handful of steps each, not a sprawling autonomous planner improvising its way through twenty tool calls. Shallow isn't a limitation we apologise for. It's the single most effective reliability decision you can make. Fewer steps, validate at each one, and fail closed when a step looks wrong rather than feeding garbage into the next link.

The discipline shows up in the loop itself: bound it, count it, check the output of every iteration before you trust it.

// Illustrative excerpt — not a copy-paste product.
// A bounded agent loop: max steps, per-step validation, fail closed.
const int MaxSteps = 6;
for (var step = 0; step < MaxSteps; step++)
{
    var result = await _gateway.SendAsync(request, ct);   // all model calls via the gateway
    if (!_validator.TryValidate(result, out var clean))   // schema + sanity check each step
        return AgentOutcome.NeedsHuman("step output failed validation");
    if (clean.IsComplete) return AgentOutcome.Done(clean);
    request = request.With(clean);                         // feed the *validated* result forward
}
return AgentOutcome.NeedsHuman("exceeded max steps");      // never loop forever

The MaxSteps guard does double duty: it caps cost (more on runaway loops and the bill) and it makes "the agent went off the rails" a bounded, observable event instead of a four-figure surprise.

Per-step accuracy compounding: 95% per step falls to about 60% at 10 steps and 36% at 20, with the shallow screening agents marked.

The LLM is a flaky dependency

You wouldn't build a payment system on a service that occasionally returns nonsense, times out, or refuses to answer. You're about to. The model is the most capable dependency you have and the least predictable. It returns 429s when you're rate-limited, times out under load, trips its own content filter on a CV that mentions something benign, and, the classic, hands back JSON with a trailing comma, a hallucinated field, or a score of "high" where you asked for a number.

The model will return garbage sometimes. Reliability is what you wrap around it so that "sometimes" doesn't reach the recruiter. The wrapping has an order to it: retry the transient failures first, validate what comes back, then repair the near-misses before you give up and call a human.

Transient failures (429s, timeouts, the occasional 503) are Polly's job: retry with exponential backoff and jitter so a thousand CVs don't all retry in lockstep and DDoS your own model endpoint.

// Illustrative excerpt. Polly v8 resilience pipeline around the gateway call.
var pipeline = new ResiliencePipelineBuilder<LlmResponse>()
    .AddRetry(new RetryStrategyOptions<LlmResponse>
    {
        ShouldHandle = new PredicateBuilder<LlmResponse>()
            .Handle<HttpRequestException>()
            .HandleResult(r => r.StatusCode == 429 || r.StatusCode >= 500),
        MaxRetryAttempts = 4,
        BackoffType = DelayBackoffType.Exponential,
        UseJitter = true,                       // de-correlate the retry storm
        Delay = TimeSpan.FromSeconds(1)
    })
    .Build();

var response = await pipeline.ExecuteAsync(
    async token => await _gateway.SendAsync(request, token), ct);

A retry fixes a flaky connection. It does nothing for a confident-but-wrong answer. That's the second layer: never trust the model's output as prose. Bind it to a schema and validate, hard, before anything downstream touches it.

// Illustrative excerpt. Structured output + schema gate. Invalid = repair or escalate.
if (CandidateScore.TrySchemaParse(response.Content, out var score))
    return score;

// One repair attempt: hand the model its own broken output and the error, ask again.
var repaired = await _gateway.SendAsync(
    request.AsRepair(response.Content, reason: "must match CandidateScore schema"), ct);

if (CandidateScore.TrySchemaParse(repaired.Content, out var fixedScore))
    return fixedScore;

return AgentOutcome.NeedsHuman("model output failed schema after repair"); // the leash

Notice the shape: retry, then validate, then one repair pass, then stop. Not an infinite "try again until it works" loop, which is how you turn a flaky dependency into a runaway bill. After one honest repair attempt, the failure goes to a human. The leash is the final layer of every reliability stack worth building: when the machine can't be sure, a person sees it.

Retries fix the line. Schemas fix the lie. The human fixes everything else.

the-math-no-recruiter-can-win-by-hand

what-an-ai-agent-actually-is

the-leash

the-toolkit

the-model-small-capable-swappable

talking-to-your-ats

use-case-1-resume-screening-against-a-job

the-shape-of-the-loop

running-it-thought-action-observation

use-case-2-cv-formatting-redacting-for-clients

reformatting-into-your-branded-template

resume-shortlisting

that-was-easy

security-compliance

keeping-pii-out-of-the-llm

exceptions-reliability

silent-api-drift-the-ats-changes-under-you

when-it-fails-anyway-dead-letter-and-the-leash

monitoring-observability

maintenance-the-lifecycle

the-scorecard-success-metrics-kpis

build-vs-buy-vs-managed

what-an-engineer-actually-costs

what-the-wider-data-says-happens-next

conclusion-how-this-gets-run-for-you

the-promises-behind-the-service

fuller-code-listings

one-full-screening-react-loop-semantic-kernel

env-deployment-reference

secrets-in-dev-vs-production

bullhorn-jobadder-endpoint-cheat-sheets

sources-further-reading

compliance-primary-law-sources

Download the full PDF for free?

Free download — no account required

Get the PDF

Prev Next