Home

/

Build It in a Weekend. Run It for Years.

/

Security & Compliance

Security & Compliance

Chapter 9
Part III
6
min read

Building a recruitment agent that screens and reformats CVs is the easy part. An agent can clear forty-five CVs before the kettle boils. Wiring it so the candidates' data never leaks is the part that takes the work.

Every one of those CVs is a pile of someone's personal life: name, date of birth, home address, photo, nationality, sometimes a medical disclosure they shouldn't have included. Your agent reads all of it, sends some of it to a model running on someone else's infrastructure, and writes a verdict to a log somewhere. In a demo, nobody's looking. In production, the candidate, the regulator, and a journalist with a Reuters byline all might be.

This page is about making leakage impossible, not unlikely. And about being able to prove it, because in production "trust me" is not a control. That second half is where most teams fall down.

Secrets & credentials

Start with the keys, because that's where the bleeding usually starts. In 2024, more than 23 million new secrets were leaked on public GitHub repositories, a 25% jump on the year before (GitGuardian, State of Secrets Sprawl 2025). API keys, OAuth tokens, database passwords, committed by accident and scraped within minutes.

A .env file is fine on your laptop. In production it's a liability. A dev build loads credentials through DotNetEnv; the production version pulls them from a managed secret store, and your application code never has to know which one it's talking to. Both arrive down the same IConfiguration pipeline.

// Dev: .env via DotNetEnv. Prod: the same keys, fetched from a secret store.
var builder = WebApplication.CreateBuilder(args);
if (builder.Environment.IsProduction())
    builder.Configuration.AddGcpSecretManager(projectId); // illustrative excerpt
else
    DotNetEnv.Env.Load(); // laptop only — never shipped

var bullhornSecret = builder.Configuration["Bullhorn:ClientSecret"];

The store is GCP Secret Manager (AWS Secrets Manager / Azure Key Vault). On Cloud Run the secret is mounted at runtime into the container, never baked into an image layer, never passed as a plaintext environment variable that shows up in the container's metadata.

# Cloud Run service — secret mounted at runtime, not built into the image
- name: BULLHORN_CLIENT_SECRET
  valueFrom:
    secretKeyRef: { secret: bullhorn-client-secret, version: latest }

The rest is discipline, not cleverness: rotate secrets on a schedule, never log them, keep .gitignore honest, and run a pre-commit secret scanner so a key can't reach the repo in the first place. And a cautionary tale for the build-it-yourself crowd: the July 2025 Toptal breach exposed 73 repositories and shipped ten malicious npm packages downstream. Your dependencies are part of your attack surface.

Guardrails — the enforcement layer

Slow down here. This is the spine of the whole approach.

The instinct most teams reach for is "we'll redact the PII before we send it." That's a blocklist, and a blocklist only catches what you thought to list. A new field, a CV in a format you've never seen, an address written in a way your regex didn't anticipate: it sails straight through. Best-effort redaction fails silently, which is the worst way to fail. You find out when the candidate does.

The guarantee comes from inverting the logic.

Don't try to remove what's dangerous. Send only what's safe, and nothing else can leave.

Allowlist, not blocklist

Resume screening and shortlisting need skills, job titles, dates, qualifications. They do not need the candidate's name, photo, home address, or date of birth. So those are the only fields you pass. You cannot leak a field you never sent. Redaction asks "what should I strip?", an open-ended question with no safe default. An allowlist asks "what does this task actually need?" The answer is always a short, named list.

One chokepoint

Every LLM call and every log write in the entire system goes through one guarded gateway. Nothing (no plugin, no service, no helper) calls the model SDK or the logger directly. If there's one door, you only have to guard one door, and you can prove the door is guarded.

public sealed class GuardedLlmGateway(IDlpInspector dlp, IChatClient model) : ILlmGateway
{
    // illustrative excerpt — every model call in the system routes through here
    public async Task<LlmResult> SendAsync(AllowlistedRequest req, CancellationToken ct)
    {
        // 1. Allowlist enforced by the type: req carries ONLY structured fields,
        //    never raw document text. A free-form CV string cannot be constructed.
        var payload = req.ToStructuredPayload();

        // 2. DLP inspection — fail CLOSED if it can't confirm the payload is clean.
        var scan = await dlp.InspectAsync(payload, ct);
        if (scan.Status != ScanStatus.Clean)
            throw new GuardrailBlockedException(scan.Findings); // blocked, not "logged & continued"

        // 3. Only now do we call the model.
        var response = await model.CompleteAsync(payload, ct);

        // 4. Output guardrail — scan the response before it's stored or sent onward.
        var outScan = await dlp.InspectAsync(response.Text, ct);
        if (outScan.Status != ScanStatus.Clean)
            throw new GuardrailBlockedException(outScan.Findings);

        return response;
    }
}

The DLP engine behind step 2 is GCP Sensitive Data Protection / Cloud DLP (AWS Macie + Comprehend / Azure AI Language PII detection). The allowlist isn't enforced by a code review or a comment. It's enforced by the type system. AllowlistedRequest has no field that can hold a raw CV string, so a developer in a hurry physically cannot construct one.

Defence in depth

Three layers, each assuming the one before it failed:

  1. Structured extraction + field allowlist at the boundary, so the model never sees the document.
  2. A DLP inspection pass that fails closed. If the scanner errors or is unsure, the call is blocked.
  3. An output guardrail that scans the model's response for leaked PII or injected instructions before it's stored or sent onward.
Three concentric guardrail gates - allowlist, DLP fail-closed, output scan - with a CV bouncing off the outer wall.

Fail closed

If you tattoo one principle from this page somewhere, make it this one. If the scanner can't confirm a payload is clean (it errored, it timed out, it came back unsure) the call is blocked, not "logged and continued." Safe by default, even when the guardrail itself breaks. A guardrail that fails open is decoration.

The same gateway protects the logs

Most teams forget the next part. You build a flawless model gateway, and then someone dumps the raw CV straight into Cloud Logging, which sits behind looser access controls than your ATS. So the logger is also a guarded sink. It accepts only typed, pre-redacted records. Raw prompts, CVs, and responses can't be written to it, because the method doesn't take a string.

public sealed class SafeLogSink(ILogger logger) : ISafeLogSink
{
    // Refuses raw payloads by construction — there is no overload that takes a string.
    public void Write(SafeLogEvent e) =>
        logger.LogInformation("decision {JobId} {CandidateRef} {Score} {Action}",
            e.JobId, e.CandidateRef, e.Score, e.HumanAction);
}

You can't log what you never had. If the raw CV never enters a variable on the log path, it can't leak there.

Proving it

Almost everyone skips this part, and it's the part that turns a claim into a control. How do you demonstrate leakage can't happen, not assert it but demonstrate it?

  • Canary tokens. Seed every test CV with a fake but unmistakable SSN and DOB (a honeytoken). Then assert that token never appears in any outbound LLM request or any log line. If it shows up, a guardrail failed, and you know before a real candidate's data does.
  • A red-team suite of adversarial CVs: hidden text, weird encodings, oversized fields.
  • A CI leak-test gate that pipes known PII end-to-end on every build and fails the build if anything escapes.
  • Runtime DLP on egress and on the log sink as a continuous backstop.
[Fact] // CI leak-test — runs on every build, blocks the merge if it fails
public async Task Canary_token_never_reaches_model_or_logs()
{
    var cv = TestCv.With(ssn: "CANARY-000-00-0000", dob: "1900-01-01");
    var capture = new EgressRecorder(); // taps the gateway + log sink

    await _agent.ScreenAsync(cv, job: "4821");

    Assert.DoesNotContain("CANARY", capture.OutboundLlmPayloads);
    Assert.DoesNotContain("CANARY", capture.LogLines);
}

And one architectural backstop so the rest doesn't rely on good intentions: network egress control. With VPC Service Controls (or an egress firewall) on the Cloud Run service, the model API endpoint is reachable only through the gateway's service account. Code can't bypass the guard even if someone tries. The network won't let it.

the-math-no-recruiter-can-win-by-hand
what-an-ai-agent-actually-is
the-leash
the-toolkit
the-model-small-capable-swappable
talking-to-your-ats
use-case-1-resume-screening-against-a-job
the-shape-of-the-loop
running-it-thought-action-observation
use-case-2-cv-formatting-redacting-for-clients
reformatting-into-your-branded-template
resume-shortlisting
that-was-easy
security-compliance
keeping-pii-out-of-the-llm
exceptions-reliability
silent-api-drift-the-ats-changes-under-you
when-it-fails-anyway-dead-letter-and-the-leash
monitoring-observability
maintenance-the-lifecycle
the-scorecard-success-metrics-kpis
build-vs-buy-vs-managed
what-an-engineer-actually-costs
what-the-wider-data-says-happens-next
conclusion-how-this-gets-run-for-you
the-promises-behind-the-service
fuller-code-listings
one-full-screening-react-loop-semantic-kernel
env-deployment-reference
secrets-in-dev-vs-production
bullhorn-jobadder-endpoint-cheat-sheets
sources-further-reading
compliance-primary-law-sources

Download the full PDF for free?

Download full PDF
build-it-in-a-weekend.pdf
Oops! Something went wrong while submitting the form.
Related Chapters