Building a recruitment agent that screens and reformats CVs is the easy part. An agent can clear forty-five CVs before the kettle boils. Wiring it so the candidates' data never leaks is the part that takes the work.
Every one of those CVs is a pile of someone's personal life: name, date of birth, home address, photo, nationality, sometimes a medical disclosure they shouldn't have included. Your agent reads all of it, sends some of it to a model running on someone else's infrastructure, and writes a verdict to a log somewhere. In a demo, nobody's looking. In production, the candidate, the regulator, and a journalist with a Reuters byline all might be.
This page is about making leakage impossible, not unlikely. And about being able to prove it, because in production "trust me" is not a control. That second half is where most teams fall down.
Start with the keys, because that's where the bleeding usually starts. In 2024, more than 23 million new secrets were leaked on public GitHub repositories, a 25% jump on the year before (GitGuardian, State of Secrets Sprawl 2025). API keys, OAuth tokens, database passwords, committed by accident and scraped within minutes.
A .env file is fine on your laptop. In production it's a liability. A dev build loads credentials through DotNetEnv; the production version pulls them from a managed secret store, and your application code never has to know which one it's talking to. Both arrive down the same IConfiguration pipeline.
// Dev: .env via DotNetEnv. Prod: the same keys, fetched from a secret store.
var builder = WebApplication.CreateBuilder(args);
if (builder.Environment.IsProduction())
builder.Configuration.AddGcpSecretManager(projectId); // illustrative excerpt
else
DotNetEnv.Env.Load(); // laptop only — never shipped
var bullhornSecret = builder.Configuration["Bullhorn:ClientSecret"];The store is GCP Secret Manager (AWS Secrets Manager / Azure Key Vault). On Cloud Run the secret is mounted at runtime into the container, never baked into an image layer, never passed as a plaintext environment variable that shows up in the container's metadata.
# Cloud Run service — secret mounted at runtime, not built into the image
- name: BULLHORN_CLIENT_SECRET
valueFrom:
secretKeyRef: { secret: bullhorn-client-secret, version: latest }The rest is discipline, not cleverness: rotate secrets on a schedule, never log them, keep .gitignore honest, and run a pre-commit secret scanner so a key can't reach the repo in the first place. And a cautionary tale for the build-it-yourself crowd: the July 2025 Toptal breach exposed 73 repositories and shipped ten malicious npm packages downstream. Your dependencies are part of your attack surface.
Slow down here. This is the spine of the whole approach.
The instinct most teams reach for is "we'll redact the PII before we send it." That's a blocklist, and a blocklist only catches what you thought to list. A new field, a CV in a format you've never seen, an address written in a way your regex didn't anticipate: it sails straight through. Best-effort redaction fails silently, which is the worst way to fail. You find out when the candidate does.
The guarantee comes from inverting the logic.
Don't try to remove what's dangerous. Send only what's safe, and nothing else can leave.
Resume screening and shortlisting need skills, job titles, dates, qualifications. They do not need the candidate's name, photo, home address, or date of birth. So those are the only fields you pass. You cannot leak a field you never sent. Redaction asks "what should I strip?", an open-ended question with no safe default. An allowlist asks "what does this task actually need?" The answer is always a short, named list.
Every LLM call and every log write in the entire system goes through one guarded gateway. Nothing (no plugin, no service, no helper) calls the model SDK or the logger directly. If there's one door, you only have to guard one door, and you can prove the door is guarded.
public sealed class GuardedLlmGateway(IDlpInspector dlp, IChatClient model) : ILlmGateway
{
// illustrative excerpt — every model call in the system routes through here
public async Task<LlmResult> SendAsync(AllowlistedRequest req, CancellationToken ct)
{
// 1. Allowlist enforced by the type: req carries ONLY structured fields,
// never raw document text. A free-form CV string cannot be constructed.
var payload = req.ToStructuredPayload();
// 2. DLP inspection — fail CLOSED if it can't confirm the payload is clean.
var scan = await dlp.InspectAsync(payload, ct);
if (scan.Status != ScanStatus.Clean)
throw new GuardrailBlockedException(scan.Findings); // blocked, not "logged & continued"
// 3. Only now do we call the model.
var response = await model.CompleteAsync(payload, ct);
// 4. Output guardrail — scan the response before it's stored or sent onward.
var outScan = await dlp.InspectAsync(response.Text, ct);
if (outScan.Status != ScanStatus.Clean)
throw new GuardrailBlockedException(outScan.Findings);
return response;
}
}The DLP engine behind step 2 is GCP Sensitive Data Protection / Cloud DLP (AWS Macie + Comprehend / Azure AI Language PII detection). The allowlist isn't enforced by a code review or a comment. It's enforced by the type system. AllowlistedRequest has no field that can hold a raw CV string, so a developer in a hurry physically cannot construct one.
Three layers, each assuming the one before it failed:
If you tattoo one principle from this page somewhere, make it this one. If the scanner can't confirm a payload is clean (it errored, it timed out, it came back unsure) the call is blocked, not "logged and continued." Safe by default, even when the guardrail itself breaks. A guardrail that fails open is decoration.
Most teams forget the next part. You build a flawless model gateway, and then someone dumps the raw CV straight into Cloud Logging, which sits behind looser access controls than your ATS. So the logger is also a guarded sink. It accepts only typed, pre-redacted records. Raw prompts, CVs, and responses can't be written to it, because the method doesn't take a string.
public sealed class SafeLogSink(ILogger logger) : ISafeLogSink
{
// Refuses raw payloads by construction — there is no overload that takes a string.
public void Write(SafeLogEvent e) =>
logger.LogInformation("decision {JobId} {CandidateRef} {Score} {Action}",
e.JobId, e.CandidateRef, e.Score, e.HumanAction);
}You can't log what you never had. If the raw CV never enters a variable on the log path, it can't leak there.
Almost everyone skips this part, and it's the part that turns a claim into a control. How do you demonstrate leakage can't happen, not assert it but demonstrate it?
[Fact] // CI leak-test — runs on every build, blocks the merge if it fails
public async Task Canary_token_never_reaches_model_or_logs()
{
var cv = TestCv.With(ssn: "CANARY-000-00-0000", dob: "1900-01-01");
var capture = new EgressRecorder(); // taps the gateway + log sink
await _agent.ScreenAsync(cv, job: "4821");
Assert.DoesNotContain("CANARY", capture.OutboundLlmPayloads);
Assert.DoesNotContain("CANARY", capture.LogLines);
}And one architectural backstop so the rest doesn't rely on good intentions: network egress control. With VPC Service Controls (or an egress firewall) on the Cloud Run service, the model API endpoint is reachable only through the gateway's service account. Code can't bypass the guard even if someone tries. The network won't let it.
Download the full PDF for free?