How I Mass-Deleted 200+ AWS IAM Users Without Breaking Production
I stared at a spreadsheet with 206 rows. Each row was an IAM user on a production AWS account. Half had MFA disabled. A quarter had access keys older than some of our junior engineers. Fifteen of them had no documented owner — nobody knew who created them or why they existed. I deleted all of them. Here is what that actually looked like.
The Mess We Were Sitting On
Every AWS account older than two years has the same disease: IAM user sprawl. A contractor joins, gets an IAM user. A pipeline needs access, someone creates an IAM user with AdministratorAccess “temporarily.” A vendor integration gets static keys shared over Slack. Nobody cleans up. Ever.
I knew it was bad, but I did not know how bad until I pulled the credential report and actually looked at it.
| What I Found | Count | How It Made Me Feel |
|---|---|---|
| Human users with console access | 87 | Nervous — most had long-lived passwords, no rotation |
| Humans with programmatic-only access | 34 | Uneasy — access keys floating around, no MFA |
| CI/CD service accounts | 42 | Angry — static keys baked into pipelines for years |
| Third-party integration accounts | 28 | Resigned — keys shared via Slack and email, naturally |
| Mystery accounts nobody could explain | 15 | Terrified |
206 IAM users. 14 IAM roles. That ratio should be inverted. I knew we had to rip this out, but the thought of breaking production kept me up at night for the first week of planning.
What We Were Building Toward
The vision was simple: zero standing IAM users for humans, IAM roles with short-lived credentials for everything else, and exactly one break-glass IAM user locked behind hardware MFA in a sealed runbook.
Human access would flow through Okta into AWS Identity Center. Authenticate with MFA, get temporary credentials, never touch long-lived keys. Service accounts would migrate to IAM roles assumed via OIDC federation — GitHub Actions, EKS workloads, the works. Third-party integrations that absolutely could not use role assumption would get keys managed by Secrets Manager with automatic rotation.
Simple to describe. Terrifying to execute on a live account.
Figuring Out What Was Actually in Use
I learned early that tribal knowledge is worthless for this kind of project. Every team had opinions about which IAM users were critical. Those opinions were wrong about 40% of the time. CloudTrail does not have opinions. CloudTrail has facts.
# Pull the credential report — your starting point for everything
aws iam generate-credential-report
aws iam get-credential-report --query 'Content' --output text | base64 -d > iam-report.csv
# csvcut is from the csvkit package — install with: pip install csvkit
cat iam-report.csv | csvcut -c user,password_last_used,access_key_1_last_used_date,access_key_2_last_used_date,mfa_active
The credential report tells you when credentials were last used. But I also needed to know what each user was doing — which API calls, which services. So I queried CloudTrail for every user:
# What has this IAM user actually been doing for the last 90 days?
# Note: lookup-events queries the Event History — a 90-day rolling window
# of management events only. If you need deeper history, query your
# CloudTrail S3 bucket or CloudTrail Lake directly.
aws cloudtrail lookup-events \
--lookup-attributes AttributeKey=Username,AttributeValue=deploy-bot-prod \
--start-time "2024-12-01T00:00:00Z" \
--end-time "2025-03-01T00:00:00Z" \
--query 'Events[].{Time:EventTime,Event:EventName,Source:EventSource}' \
--output table
I ran this across all 206 users. The results were both reassuring and disturbing. 61 users had zero activity in 90 days. Another 15 had no CloudTrail events at all — they predated our 90-day Event History retention or were never used after creation. That is 76 users — over a third — that I could disable immediately with near-zero risk.
When I showed this data to the team leads, the conversations changed overnight. Nobody argues with “this account hasn’t authenticated in 14 months.”
Migrating the Humans
Getting people off IAM users and onto Okta-federated Identity Center was the politically hardest part. Engineers hate changing their workflow, especially when the existing one “works fine.”
resource "aws_ssoadmin_permission_set" "developer" {
name = "DeveloperAccess"
instance_arn = data.aws_ssoadmin_instances.main.arns[0]
session_duration = "PT4H"
tags = {
ManagedBy = "terraform"
Migration = "iam-to-sso"
}
}
resource "aws_ssoadmin_managed_policy_attachment" "developer_policy" {
instance_arn = data.aws_ssoadmin_instances.main.arns[0]
permission_set_arn = aws_ssoadmin_permission_set.developer.arn
managed_policy_arn = "arn:aws:iam::aws:policy/PowerUserAccess"
}
I mapped each human user’s existing IAM policies to Identity Center permission sets. In most cases, I tightened permissions during the migration — turns out half the team had admin-level access “because that’s what the template had.”
The SCIM integration between Okta and Identity Center was the part that made me sweat the most. SCIM syncs by group ID, not display name, so a simple rename should survive. What actually breaks things is when someone deletes and recreates a group in Okta during a hierarchy restructure — the new group gets a new ID, and the Identity Center assignment mapping silently detaches. I learned this the hard way in staging when an entire team lost access after what looked like a group rename but was actually a delete-and-recreate under the hood. Glad I caught that before production.
Killing the Service Accounts
This was my favorite part, honestly. Replacing static access keys with OIDC federation felt like finally fixing something that should never have existed.
For GitHub Actions:
resource "aws_iam_openid_connect_provider" "github" {
url = "https://token.actions.githubusercontent.com"
client_id_list = ["sts.amazonaws.com"]
thumbprint_list = [data.tls_certificate.github.certificates[0].sha1_fingerprint]
}
resource "aws_iam_role" "github_deploy" {
name = "github-deploy-prod"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Principal = {
Federated = aws_iam_openid_connect_provider.github.arn
}
Action = "sts:AssumeRoleWithWebIdentity"
Condition = {
StringEquals = {
"token.actions.githubusercontent.com:aud" = "sts.amazonaws.com"
}
StringLike = {
"token.actions.githubusercontent.com:sub" = "repo:our-org/deploy-repo:ref:refs/heads/main"
}
}
}]
})
}
The workflow change on the GitHub side is beautifully minimal:
permissions:
id-token: write
contents: read
steps:
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::123456789012:role/github-deploy-prod
aws-region: us-east-1
No more access keys in GitHub Secrets. No more rotation schedules that nobody follows. No more “who has a copy of this key on their laptop?” conversations. Just short-lived tokens that expire automatically.
The two integrations that absolutely could not work without static credentials got moved to Secrets Manager with automatic rotation. I was not happy about it, but pragmatism beats purity.
Cleaning House: The Dormant Users
The 76 dormant and orphaned users were satisfying to deal with. I disabled them first — deactivated access keys, deleted console passwords, but kept the IAM user entity alive for two weeks as a safety net.
# Disable all access keys for a user
for key_id in $(aws iam list-access-keys --user-name "$USER" --query 'AccessKeyMetadata[].AccessKeyId' --output text); do
aws iam update-access-key --user-name "$USER" --access-key-id "$key_id" --status Inactive
done
# Remove console access
aws iam delete-login-profile --user-name "$USER" 2>/dev/null
I did this in batches of 20, watching dashboards after each one. First batch — nothing. Second batch — one alert. A cronjob on an EC2 instance was using an access key that nobody had documented. I traced it, migrated it to an instance profile, and kept going. That one discovery alone justified the entire soak period.
The Big Day
After weeks of migration and testing, I had 130 users already disabled or deleted. The remaining 76 had been migrated — new access paths tested, owners confirmed, old credentials unused for two weeks.
I picked a Wednesday. Enough runway to fix anything before the weekend, but not Monday when everyone is drowning in email and would blame me for any unrelated issue.
I scripted the deletion because doing this manually for 76 users is how mistakes happen:
#!/bin/bash
set -euo pipefail
USERS_FILE="migrated-users.txt"
LOG_FILE="deletion-log-$(date +%Y%m%d).json"
echo "[]" > "$LOG_FILE"
cleanup_user() {
local username="$1"
echo "Processing: $username"
# Remove from all groups
for group in $(aws iam list-groups-for-user --user-name "$username" --query 'Groups[].GroupName' --output text); do
aws iam remove-user-from-group --user-name "$username" --group-name "$group"
done
# Delete all access keys
for key_id in $(aws iam list-access-keys --user-name "$username" --query 'AccessKeyMetadata[].AccessKeyId' --output text); do
aws iam delete-access-key --user-name "$username" --access-key-id "$key_id"
done
# Detach managed policies
for policy_arn in $(aws iam list-attached-user-policies --user-name "$username" --query 'AttachedPolicies[].PolicyArn' --output text); do
aws iam detach-user-policy --user-name "$username" --policy-arn "$policy_arn"
done
# Delete inline policies
for policy_name in $(aws iam list-user-policies --user-name "$username" --query 'PolicyNames[]' --output text); do
aws iam delete-user-policy --user-name "$username" --policy-name "$policy_name"
done
# Clean up MFA devices
for mfa_serial in $(aws iam list-mfa-devices --user-name "$username" --query 'MFADevices[].SerialNumber' --output text); do
aws iam deactivate-mfa-device --user-name "$username" --serial-number "$mfa_serial"
aws iam delete-virtual-mfa-device --serial-number "$mfa_serial"
done
# Delete SSH public keys (CodeCommit)
for key_id in $(aws iam list-ssh-public-keys --user-name "$username" --query 'SSHPublicKeys[].SSHPublicKeyId' --output text); do
aws iam delete-ssh-public-key --user-name "$username" --ssh-public-key-id "$key_id"
done
# Delete service-specific credentials (CodeCommit HTTPS, Amazon Keyspaces, etc.)
for cred_id in $(aws iam list-service-specific-credentials --user-name "$username" --query 'ServiceSpecificCredentials[].ServiceSpecificCredentialId' --output text); do
aws iam delete-service-specific-credential --user-name "$username" --service-specific-credential-id "$cred_id"
done
# Delete signing certificates
for cert_id in $(aws iam list-signing-certificates --user-name "$username" --query 'Certificates[].CertificateId' --output text); do
aws iam delete-signing-certificate --user-name "$username" --certificate-id "$cert_id"
done
aws iam delete-login-profile --user-name "$username" 2>/dev/null || true
aws iam delete-user --user-name "$username"
jq --arg user "$username" --arg time "$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
'. += [{"user": $user, "deleted_at": $time}]' "$LOG_FILE" > tmp.json && mv tmp.json "$LOG_FILE"
echo "Deleted: $username"
}
while IFS= read -r username; do
if ! cleanup_user "$username"; then
echo "FAILED: $username — manual cleanup required" >&2
fi
done < "$USERS_FILE"
echo "Complete. $(jq length "$LOG_FILE") users deleted."
In a 206-user estate this old, a few users had CodeCommit SSH keys and signing certificates hanging around. If you skip those, delete-user fails silently and the user survives — but your log makes it look processed. The set -euo pipefail and per-user error handling make failures loud instead of invisible.
I hit enter, watched 76 users disappear in under 10 minutes, and then stared at dashboards for two days. Nothing broke. Not a single rollback request. Not one angry Slack message.
The anticlimax was the whole point. Weeks of boring preparation made the scary part boring too.
What I Would Tell You If You Are About to Do This
| What I Learned | Why It Matters |
|---|---|
| Pull CloudTrail data before you talk to anyone | People will fight you with feelings. Counter with timestamps. |
| Disable before you delete | A two-week soak catches the things CloudTrail misses — quarterly batch jobs, disaster recovery scripts, that one Lambda nobody remembers |
| Migrate services before humans | Broken pipelines page you at 3 AM. A human who cannot log in sends you a Slack message. |
| Keep one break-glass user | A single hardware-MFA IAM user in a sealed runbook is not a compromise — it is a safety net |
| Test SCIM in staging, thoroughly | Watch for delete-and-recreate masquerading as group renames in Okta. SCIM syncs by group ID — new ID means broken mappings in Identity Center. |
| Over-communicate, then communicate more | Slack announcements, Jira tickets, team walkthroughs. The only reason we had zero rollbacks is that everyone knew what was coming. |
Conclusion
The scary part of deleting 200+ IAM users is not the deletion. It is the realization that they should never have existed in the first place, and that every week you wait is another week of unnecessary exposure.
If your AWS account has IAM users created before 2023, you probably have the same mess. Start with aws iam generate-credential-report. Look at the password_last_used and access_key_1_last_used_date columns. I promise the results will motivate you to start planning.
The cleanup took effort, but I sleep better now. One break-glass user, federated access for everything else, and an IAM user list that finally makes sense.