Home

/

Build It in a Weekend. Run It for Years.

/

The model: small, capable, swappable

The model: small, capable, swappable

Chapter 3
Part II
5
min read

The model: small, capable, swappable

You don't need the biggest model to read a CV against a job. For screening, formatting, and ranking, a fast, inexpensive model such as gpt-4o-mini (a sensible default) does the job well and keeps running costs sensible. We'll model the economics properly in the build-vs-buy chapter; the headline is that the model API is the smaller part. On a budget model like gpt-4o-mini it's roughly $2–$4 per thousand CVs, a fraction of a cent each. Step up to a GPT-5-class model for an agentic screening loop (two to four model calls per CV, around 3k tokens in and 700 out apiece) and it's closer to $20–$75 per thousand, still only a few cents a CV. At agency volume, say ten thousand CVs a month, that's about $25 a month on the mini model or a few hundred (~$250–$700) on a frontier one. Real money at scale, but the expensive part, as ever, is people.

The important design choice isn't which model. It's that the model name is configuration, not code. Models get deprecated on the provider's timetable, not yours, and when that day comes you want to change one line in a secret store, not go hunting through source. Semantic Kernel makes the swap a one-liner, which is the whole point of routing through it rather than wiring a vendor SDK directly into your logic.

Bootstrapping the kernel

Here's the heart of Program.cs: load configuration, build the kernel, register the model, and (critically) register our guarded gateway as the only sanctioned way to reach it.

// Program.cs — illustrative excerpt
using DotNetEnv;
using Microsoft.SemanticKernel;

Env.Load();                                  // pull .env into the environment

var config = new ConfigurationBuilder()
    .AddEnvironmentVariables()               // DotNetEnv → IConfiguration
    .Build();

var builder = Kernel.CreateBuilder();
builder.AddOpenAIChatCompletion(
    modelId: config["OPENAI_MODEL"]!,        // swappable — never hard-coded
    apiKey:  config["OPENAI_API_KEY"]!);

// The model is reachable ONLY through the guarded gateway.
builder.Services.AddSingleton<ILlmGateway, GuardedLlmGateway>();

Kernel kernel = builder.Build();

Three things to notice. The model id and key both come from configuration, so nothing sensitive is in the source. The kernel is built once and reused. And no agent in this book is ever handed the kernel to call the model directly; they're handed an ILlmGateway. Which brings us to the most important type in the codebase.

The one door every model call goes through

Here is a rule we will hold from this page to the last: no agent calls the language model directly. Every request, every CV, every job description, every prompt, passes through a single guarded component called ILlmGateway.

Why insist on one door? Because the moment you have several places that talk to the model, you have several places to leak a candidate's data, several places to forget a safety check, several things to fix when the rules change. One door means one place to enforce the rules, and one place to prove you enforced them.

// Gateway/ILlmGateway.cs — illustrative excerpt
public interface ILlmGateway
{
    // The ONLY sanctioned path from our code to the model.
    // Inputs are structured + allowlisted, not free-form payloads.
    Task<LlmResult> InvokeAsync(LlmRequest request, CancellationToken ct);
}

// Sketch of what the guard does, in order, before any model call:
//   1. Allowlist — accept only the structured fields we expect
//   2. DLP inspect — scan for PII / secrets that must not leave
//   3. Fail-closed — if anything looks wrong, refuse the call
//   4. Call the model, via Semantic Kernel
//   5. Log through a typed safe sink that refuses raw payloads

In plain terms: the gateway only accepts the specific, structured information a task needs, never a free-form blob of whatever happened to be in memory. It inspects what's about to be sent for things that mustn't leave, such as a candidate's personal details. If anything looks wrong, it fails closed: it refuses rather than risking the leak. And it logs through a sink that won't write raw candidate data into your logs, because logs leak too.

This is the difference between hoping nothing sensitive escapes and enforcing it. The full build of this gateway, the allowlist, the DLP rules, the safe logging, is the security and compliance chapter's job. For now, just hold the shape: every snippet that follows reaches the model through this one guarded door, never the raw SDK.

One door to the model. Locked by default. That's not paranoia. It's the only version that's safe to run for years.

Built to run anywhere, billed only when working

We package the whole thing as a Docker container. That keeps it cloud-agnostic, since the same image runs on your machine, on a colleague's, and in production unchanged, and it means you're never locked to one provider's runtime. A minimal Dockerfile for a .NET service is short:

# Build
FROM mcr.microsoft.com/dotnet/sdk:8.0 AS build
WORKDIR /src
COPY . .
RUN dotnet publish -c Release -o /app

# Run — slim runtime image, no SDK
FROM mcr.microsoft.com/dotnet/aspnet:8.0
WORKDIR /app
COPY --from=build /app .
ENTRYPOINT ["dotnet", "RecruitmentAgent.dll"]

Our default deployment target is GCP Cloud Run (AWS App Runner / ECS Fargate / Azure Container Apps), and the reason is one phrase: scale to zero. With nothing coming through and no traffic to serve, the service can wind down to nothing. The architecture doesn't make you pay for idle. When work arrives, it spins up to handle it. For an agency whose hiring ebbs and flows, that elasticity is the point: you're not paying for a server humming away at three in the morning doing nothing.

Be honest about the bill, though. The near-$0 case is the hobby case: negligible traffic, no warm instance, no supporting plumbing. A real production deployment isn't that. To kill cold starts you keep at least one instance warm; you add a managed audit store (Cloud SQL), a queue (Pub/Sub), and log ingestion. Reckon on roughly $120/month at the low end, $300–$350 for a typical mid-size agency, and $500–$750+ under heavier load or with high availability. We'll cost it properly in the build-vs-buy chapter. Scale-to-zero is an architecture you want; near-zero is not the production norm.

Scale-to-zero isn't free of consequences, and we'll be honest about them later: cold starts, and making sure no in-flight work is lost when an idle instance is reclaimed. The short version, which shapes the design from here on: the service stays stateless, and anything that matters lives in the queue or the ATS, never in the container's memory.

That's the toolkit. A boring runtime someone else patches. Secrets out of the code. A model you can swap in one line. One guarded door. A container that runs anywhere and bills you only when it's actually doing something. Notice what's missing: anything exotic. That's deliberate. The stack was never the interesting part. What we build on it is.

Next: talking to your ATS, how the agent actually reaches into Bullhorn and JobAdder to fetch a CV and write back a result.

the-math-no-recruiter-can-win-by-hand
what-an-ai-agent-actually-is
the-leash
the-toolkit
the-model-small-capable-swappable
talking-to-your-ats
use-case-1-resume-screening-against-a-job
the-shape-of-the-loop
running-it-thought-action-observation
use-case-2-cv-formatting-redacting-for-clients
reformatting-into-your-branded-template
resume-shortlisting
that-was-easy
security-compliance
keeping-pii-out-of-the-llm
exceptions-reliability
silent-api-drift-the-ats-changes-under-you
when-it-fails-anyway-dead-letter-and-the-leash
monitoring-observability
maintenance-the-lifecycle
the-scorecard-success-metrics-kpis
build-vs-buy-vs-managed
what-an-engineer-actually-costs
what-the-wider-data-says-happens-next
conclusion-how-this-gets-run-for-you
the-promises-behind-the-service
fuller-code-listings
one-full-screening-react-loop-semantic-kernel
env-deployment-reference
secrets-in-dev-vs-production
bullhorn-jobadder-endpoint-cheat-sheets
sources-further-reading
compliance-primary-law-sources

Download the full PDF for free?

Download full PDF
build-it-in-a-weekend.pdf
Oops! Something went wrong while submitting the form.
Related Chapters