Seven AWS levers that cut $1.5M from a production SaaS bill
A few years into my decade-long run as principal engineer at a private SaaS, our annual AWS bill had grown faster than revenue for three quarters running. The board had started asking questions that meant someone was going to have to answer for it. I was that someone.
Over the next fiscal year we cut about $1.5M from the run rate without breaking anything in production and without slowing product velocity. The work wasn't elegant. It was a series of focused passes through the bill, in priority order, with each lever paying back something material before we moved to the next one. The bills I look at now are mostly AI workloads on Bedrock — the line items are different from that old SaaS bill but the shape of the levers is mostly the same, which is why I keep coming back to this story.
I'll make the same kind of wager I made in the last post. On a randomly-picked SaaS spending over $500K/year on AWS with no commitment-pricing strategy in place and no Enterprise Discount Program, I'd bet you can cut 25% in the first quarter without touching the application code. Probably more. Almost all of it is sitting in plain sight on the bill, and almost none of it requires clever engineering. What it requires is someone who knows where the defaults are wrong, in what order to fix them, and who has the political cover to do it.
Which is the actual through-line I want to name up front, because it changes how you read everything below. AWS bills are retail by default. Not because AWS is hiding anything — the discounts and the optimization paths are all documented — but because every default in the system is sized for someone else's worst day, and engineering culture rewards over-provisioning in ways that finance culture rarely notices until the bill compounds across three quarters. The technical levers below mostly work. The harder work is flipping the culture that produced the over-provisioning in the first place. Most teams do them in the wrong order, which is why you can do the same lever in two organizations and get wildly different results.
These are the seven levers, in the order they actually moved the bill, with the Bedrock translation at the end.
Put commitment pricing under the baseline before anything else
EC2 compute was the biggest single line item by a wide margin — around 40% of the total bill in any given month. Most of that workload was steady-state, predictable, and on-demand priced, which is the worst of all possible worlds. You're paying retail on a workload whose shape AWS would have happily discounted in exchange for a multi-year commitment, and nobody made the commitment because nobody was the person whose job it was to make it.
The same shape applies on Bedrock, which I'll keep flagging as we go. If you're running a steady-state inference workload at on-demand pricing and you've never looked at provisioned throughput, batch mode, or cross-region inference, you're paying retail on what's about to be your top line item — and the line item that costs the most to leave on retail the longest.
The fastest lever was getting commitment-based pricing under the workload. Savings Plans for the steady-state baseline, Reserved Instances for the predictable always-on capacity, on-demand only for the bursty top of the load curve. On top of that we layered a daily auto-optimization process that watched actual usage and rebalanced commitment buckets every day so we stayed close to the optimal mix without anyone manually managing it. That alone took roughly $400K out of the EC2 compute line. For AI workloads, similar tooling is starting to emerge for Bedrock provisioned throughput, but the manual approach (buy commitments for what you know is steady, leave on-demand for what spikes) gets you most of the way without it.
Put commitment pricing under the steady-state load before doing anything else. Right-sizing comes second, and only after you've stopped paying on-demand for capacity you're going to need anyway — otherwise you're optimizing instance size on the most expensive pricing tier, which is the wrong order.
Then right-size, and not before
Once commitments were under the workload, the next lever was looking at what we were actually paying for. Most of our instances were two to four times bigger than they needed to be. Some of that was history — somebody picked an instance class three years ago for a workload that had since shrunk and nobody re-evaluated. Some of it was the cultural thing I named in the opening: the default in most engineering cultures is to over-provision because the cost of an instance that's too small (downtime, paged engineers, customer impact) feels much worse than the cost of one that's too big (just money — and money on a line item that compounds quietly across years).
Compute Optimizer surfaces most of these automatically. We let it run for a week to gather data, then walked through every recommendation by hand. Some we accepted directly. Some we sized down further than it suggested, because we knew about workload patterns Compute Optimizer didn't. A few we left alone because we knew about workload patterns Compute Optimizer didn't, the other direction. The point is that it's a starting point, not an answer.
This pass took roughly another quarter off the rightsized instances on top of the commitment savings. If I'd run it first, before commitments, I would have rightsized everything to the on-demand price tier and then had to redo the math when commitments went in. Order matters.
The "EC2 — Other" line is almost never what it looks like
This is the quiet one, and I think it's the lever most teams could reach today and don't, because they assume "EC2 — Other" is some inevitable infrastructure cost rather than something they can audit.
On our bill that line was $300K a year, and most of it was NAT Gateway data processing fees plus inter-AZ data transfer. NAT Gateway charges $0.045 per GB processed. If your private subnets call AWS services like S3, DynamoDB, ECR, Secrets Manager, or any of dozens of others, that traffic was going out through NAT, hitting the public endpoint for the service, and coming back. You were paying NAT processing fees to talk to AWS itself, on the AWS network, between AWS services. The architecture diagrams hide this. The bill doesn't.
The fix is VPC endpoints. Gateway Endpoints for S3 and DynamoDB are free. Interface Endpoints for everything else are about $0.01 per hour per endpoint per AZ, which sounds like nothing until you realize the alternative is paying NAT processing on every call — at which point it's nothing on the right side of the inequality. We added Interface Endpoints for the dozen services our private workloads called most, plus the two Gateway Endpoints, and watched the EC2-Other line drop substantially in the next bill.
The generalizable point: the "EC2 — Other" line is almost never what it looks like. Drill in before you assume it's load balancer traffic or storage. In every audit I've done, this line item has hidden at least one expensive surprise that the engineering team didn't know was there.
Cold S3 data is the easiest 60% you'll ever cut
S3 was around $72K a year, and most of it was data sitting in Standard storage that hadn't been read in months. Logs we kept forever. Backups from systems that had since been decommissioned. Reports generated for one-off questions that nobody had looked at since.
We added lifecycle policies on the buckets with the most cold data. Standard for the first 30 days. Standard-IA for the next 60. Glacier Instant Retrieval after 90 days. Deep Archive after 180 days for anything older. For the buckets where access patterns were genuinely unpredictable, we turned on Intelligent Tiering, which moves objects between tiers based on actual access patterns at no extra cost beyond a small per-object monitoring fee.
Setting up lifecycle policies takes about an hour per bucket, and the savings compound from the moment they're in. I think the reason most teams haven't done this is the same reason they haven't done a lot of the others — nobody owns the bucket, the bucket grew quietly over years, and the savings on any one bucket don't justify the meeting it would take to assign ownership. The fix for that is one engineer with the political cover to do it without the meeting.
CloudWatch defaults are designed for someone else's worst day
CloudWatch was $35K a year, which seemed high for what we were actually using. Two things were going on, and both are common enough that I'd bet most production AWS accounts have at least one of them today.
First, log retention was set to "Never expire" on most log groups. That's the default if you don't set it explicitly, and most templates don't. We had years of debug-level logs from services that had been retired, sitting in S3-backed CloudWatch storage at full price, indefinitely. We dropped retention to 30 days for most groups, one year for security-relevant groups, and deleted the log groups for retired services entirely.
If you're running voice or agent workloads, this gets worse fast. Every turn produces a span. Every retry produces another. The default working assumption I'm seeing now is 30-day retention hot, with anything older going to S3 if you need it for replay or eval — and a serious eval pipeline (more on this in the audit piece) needs the raw logs not the aggregated metrics.
Second, custom metrics were quietly expensive. CloudWatch charges per metric per month, and high-cardinality custom metrics had blown up our metric count without anyone noticing. The usual culprits are dimensions like user ID, request ID, or anything else with thousands of unique values. On AI workloads, add tenant ID and model name to that list — though I'd argue (and have, repeatedly) that you actually want per-tenant cost dimensions, just not in CloudWatch's pricing model. Push them to a metrics backend that handles cardinality without the per-metric tax.
Combined, this cut the CloudWatch line by more than half.
RDS and ElastiCache pick up the same pattern, cheaper
RDS was about $46K and ElastiCache about $34K. Both had the same problem as EC2 — instances picked years earlier for workloads that had since changed shape, plus a couple of clusters that had been provisioned for projects that never shipped and were sitting idle for as long as anyone could remember. The cost of the idle clusters alone, summed across the years they'd been sitting there, was depressing to calculate.
The first pass was just removing the idle clusters. The second was rightsizing the instances based on actual CPU, memory, and connection metrics. The third was evaluating Aurora I/O-Optimized for the write-heavy databases — a different storage tier that costs more per hour but eliminates I/O-per-request charges. For one of our larger databases the math worked out to about 30% savings on net, which is the kind of number that earns the meeting to switch.
This lever wasn't huge in absolute dollars compared to compute, but it was nearly free to do once we knew what to look at. Almost every audit has at least one Aurora candidate for I/O-Optimized, and the conversion is a one-click change at the database level.
If you're over $1M annual and don't have an EDP, you're paying retail by choice
This one isn't a technical lever. It's a negotiation lever, and I think it's the single most important thing in this post, because every other lever is partially obviated if you have an EDP and partially wasted if you don't.
If you're spending more than roughly $1M a year on AWS, you can negotiate an Enterprise Discount Program. You commit to a multi-year spend at a defined growth rate. AWS gives you a percent discount on your entire bill in exchange. The discount tiers up with commitment size and term length. The numbers move — AWS account teams have flexibility, especially toward end of quarter or end of fiscal year (Amazon's fiscal year ends December) — but the directional story is real and the conversation is one email away if you have the spend.
We were on the Business support plan paying about $90K a year. As part of the EDP conversation, support went up to a higher tier with TAM access at no incremental cost, and the EDP discount took a percentage off the entire bill including the support line. Negotiating an EDP takes a few weeks. The savings compound for the term of the agreement.
I want to say this directly because I think it's the most under-known thing in cloud finance. If your bill is over a million dollars annually and you don't have an EDP, you are paying retail by choice. AWS will negotiate. Your account team's job is to negotiate. The thing keeping the conversation from happening is that nobody on the engineering side knows it exists, and finance assumes engineering would have asked if it were available. That communication gap is worth, on a $1M annual bill at modest discount tiers, between $40K and $100K a year. On a $20M bill it's three to five million dollars a year. It is the single most asymmetric meeting you can put on a calendar in this space.
The patterns that generalize
Looking back across these seven things — and across the AWS bills I've reviewed since — I think the through-line is small enough to fit in a few sentences.
The top three line items in any AWS bill are usually 70% of the spend. Start there. The "EC2 — Other" line is almost always NAT Gateway data processing, EBS, and inter-AZ data transfer in disguise. Drill into it before you assume it's inevitable. S3 lifecycle policies are free to set up and save 60–70% on cold data; there's no good reason not to have them on every bucket with non-trivial volume. If your bill is over a million dollars annually and you don't have an EDP, you are, again, paying retail by choice.
The cultural lever underneath all of these is the one I think gets the least attention. Most teams over-provision because the cost of being wrong slow feels much worse than the cost of being wrong fast — paged engineers vs silent dollars. Inverting that culture is the real lever. The technical work just makes the cost of over-provisioning visible enough that the culture has to confront it.
What's different on a Bedrock bill
The bill I just described is a SaaS bill from a few years ago. The bills I'm looking at now are mostly AI workloads. The lever order is roughly the same. The line items are different.
Compute commitment becomes provisioned throughput vs on-demand vs batch. The math is the same as Savings Plans vs Reserved vs On-Demand on EC2. If your inference traffic is steady, you're a candidate for provisioned throughput. If it's bursty and async-tolerant, you're a candidate for batch at 50% off. If it's bursty and synchronous, on-demand is what you've got — but you should still be checking cross-region inference for cheaper available capacity.
Right-sizing becomes model selection. Sonnet for the workload that needs it. Haiku or Nova for the workload that doesn't. Intelligent prompt routing if you can't decide on a per-request basis. I'm seeing 30–90% cuts from model choice alone on workloads that had been over-provisioned to Sonnet by default — which is the AI version of running everything on m5.4xlarge because nobody re-evaluated.
NAT Gateway processing has a Bedrock parallel: prompt caching. Static prefixes (system prompts, tool definitions, retrieved documents you're chatting against) get cached at 10% of the input token cost. Most teams haven't measured their cache potential. The savings are 60–90% on cacheable prefixes, and the implementation is mostly a wrapper around the SDK call.
S3 lifecycle becomes vector storage tiering. S3 Vectors at sub-second latency for cold archives is up to 90% cheaper than OpenSearch Serverless for the same data. Same decision tree: hot data in OpenSearch or pgvector, cold data in S3 Vectors, lifecycle moves it for you.
CloudWatch becomes turn-level observability. Voice and agent workloads produce ten to a hundred times more spans per session than a typical SaaS request. Same retention rules — 30 days hot, longer cold, aggregate before you publish.
EDP still applies. If your AWS bill has crossed seven figures and Bedrock is a meaningful slice of it, the EDP conversation should also include Bedrock-specific commitments and discount tiers. AWS will negotiate.
The lever I'd add that didn't exist on the SaaS bill is multi-tenant cost attribution. If you're running Bedrock for more than one customer and you can't answer "what does customer X cost us this month" in real time, you'll spend days reconstructing it the first time finance asks. Build the metering at the application layer from day one. Five minutes on day one. A week of forensics on day three hundred. (I wrote about this exact failure mode at more length in the audit-ai-built-saas piece — the asymmetry is the same shape.)
What I'd do differently
Three things, looking back.
I'd tag everything from day one. Cost-attribution tags on every resource, every project, every team. Not because the tags save money themselves, but because every single one of these levers was easier to identify and prioritize on the parts of the bill where tagging was thorough. Where tagging was missing, we spent days tracing line items back to owners before we could even decide what to cut. The AI version of this is per-tenant attribution on every Bedrock invocation, and the same logic applies — five minutes on day one, a week of forensics later.
I'd run Compute Optimizer earlier. We waited until we'd already done most of the manual rightsizing work before letting it loose, which was backwards. It would have given us a better starting point.
I'd be more skeptical of CloudFormation and Terraform defaults. Most of the oversized RDS and ElastiCache instances came from templates written years ago by people who picked safe-feeling instance classes. The defaults compounded across dozens of stacks before anyone noticed — which is the IaC version of the cultural problem I named at the top. The template was someone's worst-day sizing. It ran everywhere for years.
The closing thought
Each lever paid back something material before we moved to the next. By the end of the year we'd cut about $1.5M off the run rate, the bill was no longer the topic of every board meeting, and product velocity hadn't slowed. The technical work was honestly not that interesting. The political work — getting the meetings that authorized the commitments, getting the EDP conversation onto someone's calendar, getting engineering to accept that "right-sized" was not the same as "under-provisioned" — was where the actual savings came from.
I think this is the part most engineering-led cost-optimization posts miss. The levers aren't secret. AWS publishes them. The blocker is almost never knowledge. The blocker is that nobody owns the bill, and the people who could fix it aren't sure they have the political cover to make the meeting happen.
If your AWS bill is growing faster than revenue, that meeting is the lever.