I deleted every Kubernetes Secret in our clusters. Uptime went up.
I deleted every kind: Secret object in our clusters. Uptime went up.
Not a typo. We still have secrets — database passwords, API tokens, TLS keys, signing keys, OAuth client secrets. None of them live in a Kubernetes Secret anymore. The namespace is empty. kubectl get secrets -A returns the boring projected service-account tokens and nothing else, and we’re working on those too.
This post is the long version of what changed, why, and the migration path I’d run again tomorrow.
Why kind: Secret is security theater
Let’s be honest about what a Kubernetes Secret actually is: a base64-encoded blob in etcd with some RBAC on top.
That’s it. The word “encoded” is doing a lot of work there. Base64 is not encryption. It’s the encoding your browser uses for data URLs. If you kubectl get secret -o yaml you get the material back in plaintext with one base64 -d.
The defenses people point to:
- Etcd encryption at rest: yes, and the KMS key is usually a cluster-wide key the API server can read. Anyone with cluster-admin can read every secret in every namespace through the API. It protects against someone stealing an etcd backup, nothing more.
- RBAC: a ServiceAccount with
geton secrets in a namespace can read every credential in that namespace. One over-scoped operator, one compromised pod with a broad SA, and the blast radius is the whole namespace. - Sealed Secrets / SOPS: sealed-secrets solves checking encrypted material into Git. It does not solve the runtime problem — the controller decrypts into a regular Secret object, and from there we’re back to base64 in etcd.
- External Secrets Operator: syncs from a real secret store into…
kind: Secret. You’ve moved the source of truth but kept the weak sink.
If your threat model includes “someone with cluster-admin” — and it absolutely should, because that includes every CI runner with a kubeconfig, every operator you’ve ever installed from a random Helm chart, and every node that gets popped — inline K8s Secrets are a speed bump, not a control.
I should be honest about what the new world doesn’t fix: an attacker with cluster-admin can still kubectl exec into a running pod and cat the mounted file. What we actually buy is three things — long-lived credential material stops persisting in etcd, the blast radius of a single popped pod shrinks to its KSA’s grants instead of the whole namespace, and every secret access leaves a caller-attributed line in Cloud Audit Logs or CloudTrail. That’s it. That’s enough.
The control you actually want is IAM on a purpose-built secret store, with pod identity doing the auth.
What replaced kind: Secret
Three pieces. That’s the whole architecture.
1. Secret Manager as the source of truth
On GKE: Google Secret Manager. On EKS: AWS Secrets Manager (Parameter Store works too for lower-sensitivity config).
Everything lives there. Versioned, IAM-bound, audit-logged. Rotation is a version bump — you add version 17, the old one sticks around until you disable it, and every access shows up in Cloud Audit Logs or CloudTrail with the caller identity.
A secret in GCP Secret Manager:
resource "google_pubsub_topic" "secret_rotation" {
name = "secret-rotation-events"
}
resource "google_secret_manager_secret" "db_password" {
secret_id = "prod-postgres-password"
replication {
auto {}
}
topics {
name = google_pubsub_topic.secret_rotation.id
}
rotation {
next_rotation_time = "2026-06-01T00:00:00Z"
rotation_period = "2592000s" # 30 days
}
}
resource "google_secret_manager_secret_iam_member" "db_password_access" {
secret_id = google_secret_manager_secret.db_password.id
role = "roles/secretmanager.secretAccessor"
member = "serviceAccount:${google_service_account.api_workload.email}"
}
Two things worth calling out about that snippet, because they bite people. First, the topics block is mandatory once you set rotation — Terraform will reject the resource without it. Second, Secret Manager’s rotation feature does not rotate the secret value for you. It publishes a Pub/Sub event at next_rotation_time and that’s the whole feature. The actual rotation logic — generate a new password, update the database, add a new version — runs in a Cloud Function or Cloud Run job that subscribes to the topic. The schedule is the trigger, not the rotator.
The IAM binding is per-secret, not per-project. The API workload’s GCP service account can access this one secret and nothing else. That’s the control surface I want.
2. Secrets Store CSI Driver mounts them as files
The Secrets Store CSI Driver plus a provider (secrets-store-csi-driver-provider-gcp or secrets-store-csi-driver-provider-aws) mounts secrets as files into the pod at startup, reads them from Secret Manager using the pod’s identity, and refreshes them on a TTL.
A SecretProviderClass on GKE:
apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata:
name: api-secrets
namespace: api
spec:
provider: gke
parameters:
secrets: |
- resourceName: "projects/my-project/secrets/prod-postgres-password/versions/latest"
path: "db-password"
- resourceName: "projects/my-project/secrets/prod-stripe-key/versions/latest"
path: "stripe-key"
And the pod mount:
volumes:
- name: secrets
csi:
driver: secrets-store.csi.k8s.io
readOnly: true
volumeAttributes:
secretProviderClass: api-secrets
containers:
- name: api
volumeMounts:
- name: secrets
mountPath: /var/run/secrets/app
readOnly: true
The application reads /var/run/secrets/app/db-password like any other file. No SDK dependency, no startup-time secret fetch to manage, no retry logic. The CSI driver DaemonSet handles fetching; the kubelet just calls it like any other CSI volume.
One thing to be deliberate about: versions/latest is convenient and what we ship by default, but it means a bad rotation propagates fleet-wide on the next CSI refresh. There’s no canary window. For our highest-blast-radius credentials we pin the SPC to a specific version and roll forward via a Terraform PR — slower, but you get to undo.
3. Pod identity does the auth
This is the part that makes the whole thing work without credential files on disk.
On GKE, Workload Identity maps a Kubernetes ServiceAccount to a GCP service account. Pods using that KSA get short-lived tokens minted via GKE’s metadata server. No JSON key files. No Kubernetes Secret holding credentials. (GKE also offers Workload Identity Federation with direct IAM bindings to Kubernetes principals — same outcome, no intermediate GSA. We stuck with the GSA-impersonation form because our existing IAM bindings already targeted GSAs.)
resource "google_service_account_iam_member" "workload_identity" {
service_account_id = google_service_account.api_workload.name
role = "roles/iam.workloadIdentityUser"
member = "serviceAccount:my-project.svc.id.goog[api/api-sa]"
}
apiVersion: v1
kind: ServiceAccount
metadata:
name: api-sa
namespace: api
annotations:
iam.gke.io/gcp-service-account: api-workload@my-project.iam.gserviceaccount.com
On EKS, IRSA does the equivalent with OIDC. For newer clusters, EKS Pod Identity is the path AWS now recommends — no per-cluster OIDC trust to maintain, cross-account is easier, and the Pod Identity Agent removes the need for a projected SA token in the pod. Same outcome either way: pods authenticate to the secret store using an identity scoped to their ServiceAccount, with zero credential material living in the cluster.
The pod asks for a secret. The CSI driver uses the pod’s identity to fetch it. The secret lands as a file. That’s the entire chain.
What we deleted along the way
An honest list of what went to the trash:
- External Secrets Operator for the application-credential path. We still run ESO on a couple of narrow flows (cert-manager output pushed into Vault via
PushSecret, mostly), but it stopped being the primary K8s secret path. - Sealed Secrets controller and the per-cluster sealing key rotation dance.
- SOPS-encrypted secrets in Git — no more age-key distribution for the platform team.
- Custom Vault Agent Injector configs — we still use Vault for a few things, but not as the primary K8s secret path.
- The legacy auto-token-mount pattern, where any pod without an explicit ServiceAccount got a token mounted it didn’t ask for. We patched the default ServiceAccount in every workload namespace with
automountServiceAccountToken: falseand enforced it via Kyverno; workloads that need a projected token opt in explicitly. One catch worth knowing: this interacts with Workload Identity and IRSA, both of which expect a projected SA token (the OIDC token) to be available. The fix is either an explicitserviceAccountTokenprojection in the pod spec, or move to EKS Pod Identity, which doesn’t need one.
Each of these was a moving part with its own failure mode, its own upgrade path, its own on-call burden. Deleting them wasn’t the goal; the goal was fewer credential paths. Fewer paths happened to mean less code.
The uptime win we didn’t plan for
The reason we started this wasn’t uptime. It was audit — auditors do not love “base64 in etcd” as a credential story.
But the uptime improvement was real and measurable. Here’s why.
The old rotation path: update the Secret object → the mounted Secret volume content changes → kubelet eventually projects the new content into the pod → but many apps read the file once at startup and cache in memory. To pick up the new value, you restart the pod. A 50-pod Deployment rotating a secret is a rolling restart. Rolling restarts sometimes fail readiness gates. Failed readiness gates sometimes page someone at 2am.
The new rotation path: bump the version in Secret Manager → the CSI driver refreshes the mounted file on its next reconcile (we run 60s) → apps that re-read on 401 or on a file-watch signal just pick up the new value → no pod restart.
The change that really moved the uptime number was separating rotation from deployment. Before, every credential rotation looked like a deploy. After, rotations are invisible to the scheduler.
For the apps that don’t re-read on 401 — legacy services that load creds once at boot and hold them forever — we added a SIGHUP handler or an inotify-based file watcher (CSI projects updated files as atomic symlink swaps, which fsnotify and watchdog handle cleanly). The pattern in driver land is credential refresh on auth failure — pgx v5 supports it via BeforeConnect, the AWS SDKs do it through their credential provider chain, HikariCP has a reload path. That’s app work, not platform work, and it’s a one-time change per app.
One note on the CSI rotation feature itself: it’s been stable for us at the version we run, but auto-rotation lived in beta for a long time and has provider-specific quirks. Pin the driver version, test upgrades in staging first, alert on driver pod health. Don’t flip it on in prod and walk away.
The honest trade-offs
I’d be lying if I said this came free.
Apps that cache credentials forever still need a reload hook. CSI refreshes the file; it does not magically reload your app’s in-memory connection pool. If your Postgres driver grabs the password at boot and hands it to libpq, bumping the Secret Manager version doesn’t help until the process restarts or you teach it to re-read. We audited every app and either added a reload hook, switched to a driver that re-reads on auth failure, or accepted a scheduled restart for the laggards.
CSI driver is another dependency. It’s a DaemonSet on every node, it has its own version skew to track, and when the provider plugin has a bad day your pods don’t start. We pin versions, test upgrades in a staging cluster first, and alert on CSI driver pod health. It’s been stable, but it’s not zero operational cost.
Cold-start latency adds ~200-500ms. The CSI driver has to fetch the secret before the container starts. For long-running services this is invisible. For cold-start-sensitive workloads, it’s a real cost.
Secret Manager is now a hard dependency for pod scheduling. With kind: Secret, etcd was local to the control plane and pods could start during a partial cloud control-plane outage. With CSI + Secret Manager, a regional Secret Manager incident means new pods can’t start. For our most critical workloads we use regional replication on the secrets and accept the dependency; for everything else, auto replication is fine.
Refresh interval is a cost and quota knob, not just a freshness knob. Secret Manager API calls aren’t free — roughly $0.06 per 10k accesses on GCP, $0.05 on AWS — and at the scale of N pods × M secrets × refresh rate, both the calls and the audit-log volume add up fast. We tune the CSI refresh interval per workload class rather than globally, and our “who accessed this secret last month” queries filter aggressively by caller identity and time window because the refresh polls dominate the raw volume.
Some tools still expect kind: Secret. Cert-manager writes TLS certs — including the private keys — to Secrets. Various Helm charts assume a Secret exists. We let those live in the cluster, but I won’t pretend they’re “ephemeral” — a 90-day Let’s Encrypt private key sitting in etcd is just as sensitive as a DB password. The actual rule is: long-lived credentials for external systems go to Secret Manager; cluster-internal material that’s regenerable without external coordination can stay as Secrets, on the understanding that the etcd risk applies to those too.
Migration playbook
If you’re doing this, here’s the order we ran it in. It took us about six weeks across three clusters, mostly because of app reload hooks, not platform work.
Week 1: Pick one non-critical service. Pick something that already re-reads credentials on failure. A stateless API service is ideal. Get Workload Identity / IRSA / Pod Identity working for it. Create one secret in Secret Manager, mount it via CSI, keep the old kind: Secret as a fallback. Verify the file contents match.
Week 2: Cut over the first service. Remove the fallback Secret. Rotate the secret once in Secret Manager to prove the refresh path works end to end. Document the pattern: one SecretProviderClass per service, IAM per-secret, one KSA per workload.
Week 3-4: Batch migrate the obvious wins. Anything stateless, anything with a decent reload story, anything where the team owns both the app and the deploy. We did about 40 services in this window.
Week 5: The stragglers. Legacy services, third-party charts, anything with gnarly credential loading. For each one: add a reload hook, accept a scheduled restart window, or leave it on kind: Secret with a note. Perfection is the enemy here.
Week 6: Clean up. Delete Sealed Secrets. Delete SOPS tooling from the platform repo. Trim External Secrets Operator down to the narrow paths that still need it. Remove the stale IAM roles that used to back them. Patch the default ServiceAccount in each namespace and enforce it with policy.
What good looks like after
A year in, here’s what changed that I didn’t expect:
- Secret rotation became routine. It used to be a quarterly project. Now it’s a Terraform PR that bumps a version and a merge.
- Audit reviews are shorter. “Show me who accessed the Stripe key last month” is a Cloud Audit Logs query (with caller filters, because the CSI refresh polls are noisy), not a guess.
- Onboarding new services is faster. The pattern is: one IAM binding, one SecretProviderClass, one annotation. No secret material ever touches Git or a developer laptop.
- Compromised pods have a smaller blast radius. A popped pod sees only the secrets its KSA was granted. Not the whole namespace. Not the whole cluster.
The short version: IAM + CSI + pod identity replaces four or five moving parts with one pattern. The rotation story is better. The audit story is better. The blast radius is smaller. And we stopped writing base64 blobs into etcd and calling it security.
What’s keeping your team on kind: Secret?