Here's a number that should worry you more than any leaked secret. Suppose each step your agent takes is 95% reliable: a model call that returns sensible JSON, an ATS read that comes back with the field you expected, a write that lands. Ninety-five per cent. You'd sign for that.
Now chain ten of them. The maths is unforgiving: 0.95 to the power of ten is about 60%. Twenty steps and you're at 36%. A multi-step agent isn't ninety-five per cent reliable. It's a chain of coin-flips, and the chain decides.
Reliability doesn't average across steps. It multiplies. Every step you add is a tax on the whole.
This is why the recruitment agents in this build, like the resume-screening loop, were built shallow: a handful of steps each, not a sprawling autonomous planner improvising its way through twenty tool calls. Shallow isn't a limitation we apologise for. It's the single most effective reliability decision you can make. Fewer steps, validate at each one, and fail closed when a step looks wrong rather than feeding garbage into the next link.
The discipline shows up in the loop itself: bound it, count it, check the output of every iteration before you trust it.
// Illustrative excerpt — not a copy-paste product.
// A bounded agent loop: max steps, per-step validation, fail closed.
const int MaxSteps = 6;
for (var step = 0; step < MaxSteps; step++)
{
var result = await _gateway.SendAsync(request, ct); // all model calls via the gateway
if (!_validator.TryValidate(result, out var clean)) // schema + sanity check each step
return AgentOutcome.NeedsHuman("step output failed validation");
if (clean.IsComplete) return AgentOutcome.Done(clean);
request = request.With(clean); // feed the *validated* result forward
}
return AgentOutcome.NeedsHuman("exceeded max steps"); // never loop foreverThe MaxSteps guard does double duty: it caps cost (more on runaway loops and the bill) and it makes "the agent went off the rails" a bounded, observable event instead of a four-figure surprise.
You wouldn't build a payment system on a service that occasionally returns nonsense, times out, or refuses to answer. You're about to. The model is the most capable dependency you have and the least predictable. It returns 429s when you're rate-limited, times out under load, trips its own content filter on a CV that mentions something benign, and, the classic, hands back JSON with a trailing comma, a hallucinated field, or a score of "high" where you asked for a number.
The model will return garbage sometimes. Reliability is what you wrap around it so that "sometimes" doesn't reach the recruiter. The wrapping has an order to it: retry the transient failures first, validate what comes back, then repair the near-misses before you give up and call a human.
Transient failures (429s, timeouts, the occasional 503) are Polly's job: retry with exponential backoff and jitter so a thousand CVs don't all retry in lockstep and DDoS your own model endpoint.
// Illustrative excerpt. Polly v8 resilience pipeline around the gateway call.
var pipeline = new ResiliencePipelineBuilder<LlmResponse>()
.AddRetry(new RetryStrategyOptions<LlmResponse>
{
ShouldHandle = new PredicateBuilder<LlmResponse>()
.Handle<HttpRequestException>()
.HandleResult(r => r.StatusCode == 429 || r.StatusCode >= 500),
MaxRetryAttempts = 4,
BackoffType = DelayBackoffType.Exponential,
UseJitter = true, // de-correlate the retry storm
Delay = TimeSpan.FromSeconds(1)
})
.Build();
var response = await pipeline.ExecuteAsync(
async token => await _gateway.SendAsync(request, token), ct);A retry fixes a flaky connection. It does nothing for a confident-but-wrong answer. That's the second layer: never trust the model's output as prose. Bind it to a schema and validate, hard, before anything downstream touches it.
// Illustrative excerpt. Structured output + schema gate. Invalid = repair or escalate.
if (CandidateScore.TrySchemaParse(response.Content, out var score))
return score;
// One repair attempt: hand the model its own broken output and the error, ask again.
var repaired = await _gateway.SendAsync(
request.AsRepair(response.Content, reason: "must match CandidateScore schema"), ct);
if (CandidateScore.TrySchemaParse(repaired.Content, out var fixedScore))
return fixedScore;
return AgentOutcome.NeedsHuman("model output failed schema after repair"); // the leashNotice the shape: retry, then validate, then one repair pass, then stop. Not an infinite "try again until it works" loop, which is how you turn a flaky dependency into a runaway bill. After one honest repair attempt, the failure goes to a human. The leash is the final layer of every reliability stack worth building: when the machine can't be sure, a person sees it.
Retries fix the line. Schemas fix the lie. The human fixes everything else.
Download the full PDF for free?