aws-nuke for Tidy Platform Engineering in AWS
By setting the filtered role and its policy attachments in the nuke.yaml, you can easily nuke everything except what's needed to recreate it all.
There's a new team emerging in your nearby cloud native software shops - the platform team - and their practice is placed squarely in gap between Infrastructure as Code and the Internal Developer Platform that's being built to support software delivery at scale.
The platform team has their own product to build: a self-service platform that their developers need, built on the foundation and posture that their company requires.
To iterate on these cloud native platforms, there's a common technique of:
- building the cloud/cluster resources for a platform instance
[terraform apply] - binding applications to the cluster through bootstrap gitops orchestration
[argocd registration] - testing out the platform
[conducted by engineers and test tools] - destroying the cluster
[terraform destroy]
When everything goes right with your orchestration, this is a pretty tidy cluster lifecycle. The infrastructure that you provision at the start of the cycle all gets torn down at the end of the cycle and nothing is left behind so you can do it again without collisions in your cloud. But when you're developing platforms and making changes - things can fail and resources can get orphaned.
Orphaned cloud resources are bad for your platform team operations and need to be cleaned up. If you don't, you're spending more than you need to in your cloud, and your subsequent iterations are more likely to fail due to cloud resource naming collisions. In a simple cloud like Civo that only needs 4 resources to get you a kubernetes cluster, you can remove an environment with a handful of clicks, no big deal. But in a more elaborate cloud like AWS, this can turn into quite a job as it can take 100 or more cloud resources to establish and secure an EKS cluster. That can be a lot of cleanup, and your platform engineer is currently busy with their new bug that broke the cycle.
Platform Engineering at Kubefirst
At kubefirst, we're not only practicing platform engineering internally, but our software team is also building the tools that build the platforms for other platform teams to use. Suffice to say, we have a lot of platform provision operations underway all the time, in many clouds, and we orphan resources regularly while we're busy breaking stuff in our dev accounts.
AWS is the challenge of this story because of the resource count that's in play. In the reality of tight timelines, some things would simply never get cleaned up. Every once in a while, you just wish you could just nuke the whole account and start over.
An open source tool to reset AWS accounts
We recently discovered rebuy-de's aws-nuke project and it's amazing. It's also incredibly destructive, so please use it with care and caution.
The way it works is it lets you set up a config file where you can set up allow lists and deny lists to tell the tool what to target and what to avoid. Then you run a simple command in your terminal, it'll find all the resources in your cloud and tell you what you would delete if you agree to.
It has some nice safety features built in like a default dry-run mode and an inability to run in accounts with an alias that includes the string prod
.
Kubefirst Platform Team Starts "Nuk'em Tuesdays"
This utility was really awesome news to our platform team, and we've now made it part of our weekly cadence. Every Tuesday in our AWS development environment, we'll blow away almost everything in the account so we can start over from scratch and rebuild it with our IaC. Platform teams that practice provisioning in AWS would likely find this tool valuable too.
We have a mgmt account where we have our central kubefirst cluster, and then we manage our downstream AWS environments from that cluster using assumed roles that our Atlantis (terraform automation) service account can assume. By setting that role and its policy attachments in the nuke.yaml filters, you can easily nuke everything except what's needed to recreate it all.
Once you have your nuke.yaml set up, you can run aws-nuke and pass your config file to preview what it's going to try to to delete.
--no-dry-run
flag to actually conduct the nuke operation. This can be a very destructive operation. With no filters in your config, it will remove everything. Please be sure you're willing to remove everything it says it will remove.So we nuke everything in the account except the 1 role that Atlantis leverages. Then we use an Atlantis pull request to get the environment back in full. A mint condition environment with no risk of orphaned resources to ruin your provisioning operations. How nice.
We've been doing this practice for a couple months and it's been incredibly helpful to our team. We wanted share with the platform engineering community and our own kubefirst platform users.
Free Open Source Kubefirst Platforms
To see how our free and open source kubefirst platforms can transform your provisioning and platform engineering operations, give it a run and see for yourself!
Your new fully automated platform will be just a few clicks away.