get in touch
Evergen Logo

From Monolith to API-First: The Cloud Migration That Saved Evergen

They Were Losing $12,000 a Day. Every New Customer Made It Worse. Here's How We Fixed It—And Transformed Their Business.

  • Go

  • Kafka

  • AWS

  • Kubernetes

  • MongoDB

  • ArgoCD

  • Docker

  • API Design

The Numbers

Metric Before After
Cost per client $50/month $0.60/month
Processing speed 20 seconds 50 milliseconds
Uptime 91% 99.9%
Asset capacity 400 sites (maxed) Hundreds of thousands
Daily burn $12,000 Profitable
TL;DR

Evergen had a real product in a market that doesn't wait. Energy pricing in Australia changes every 5 minutes. The problem wasn't "performance" or "cloud spend" as separate issues—it was unit economics driven by architecture: a fragile Azure monolith with business logic buried in hundreds of stored procedures. We reverse-engineered the system, rebuilt it as an API-first platform on AWS, ran two production systems in parallel for a year, then cut over with zero customer impact. And the company stopped dying.

Response time: from 20 seconds to 50 milliseconds
Response time: from 20 seconds to 50 milliseconds

Evergen had everything. AI that actually worked for energy optimization. Backing from AMP. 400 customers. Perfect market timing.

One problem: they were losing $30 on every single customer.

New Leadership, Ugly Truth

Ben Hutt came in as CEO. Ben Burns as COO. They started digging into the numbers.

What they found was ugly.

Cost per client was $50. Revenue was $20. The math was brutal. Sixty days of runway left. The faster they grew, the faster they died.

They looked at the engineering team. Three developers watching the clock until 5 PM. Zero documentation. Hundreds of MSSQL stored procedures containing all the business logic. The one engineer who understood the AI? Gone months ago. No notes. No comments. Nothing.

The existing team's solution? "Buy bigger MSSQL instances."

Ben and Ben knew they needed outside help. Fast.

What We Found When We Arrived

They called us in. We walked into the Sydney office and confirmed their fears.

The system was worse than the numbers suggested. The infrastructure wasn't just expensive—it was fragile. One bad deployment away from taking down the whole thing.

When I say "fragile," I don't mean "a few flaky alerts." I mean the kind of system where you don't deploy because you can't predict what'll break, and you can't predict what'll break because there are no tests, no docs, and half the business logic is hidden in SQL nobody has the confidence to touch.

That's not a tech problem. That's a company survival problem.

Then Azure Crashed

Mid-project, Azure deprecated our Kubernetes node version. No warning. No migration tools.

Our entire cluster went down during a routine deployment. We spent all night with Azure support trying to rebuild it.

Every time we needed to scale? Two-week approval cycle. We had to beg to pay them more money.

When we called AWS with the same questions, a solutions architect called back: "I'm just down the street. Want me to stop by after lunch?"

2:47 AM to 5:24 AM. The night everything almost fell apart.
2:47 AM to 5:24 AM. The night everything almost fell apart.

That's when we decided to move.

The Impossible Part

We had to reverse-engineer an AI that nobody understood.

Evergen's optimization engine made split-second decisions about when to buy and sell energy. It lived in hundreds of stored procedures. No documentation. No tests. Just code.

We spent months analyzing stored procedure dependencies. Mapping data flows. Testing edge cases with real energy data.

One memorable discovery: a procedure burning massive computing power 24/7 to create in-memory tables, then doing nothing with them. Legacy code from someone who'd left years ago.

This is the part most people underestimate. It's not "rewrite it in Go." It's "prove the new system makes the same financial decisions as the old system," because in energy markets, being slightly wrong isn't a bug. It's lost money. Or compliance trouble. Or both.

What We Had to Rebuild

To understand why this was hard, you need to understand what Evergen's platform actually does. This isn't a simple CRUD app. It's a real-time AI that makes financial decisions every 5 minutes for thousands of energy assets.

7 core systems we reverse-engineered from undocumented SQL:

Forecasting Engine Real-Time Control Market Bidding Fleet Monitoring Anomaly Detection Behind-the-Meter Front-of-Meter

Forecasting Engine

The AI predicts optimal buy/sell decisions by analyzing energy consumption patterns, solar generation forecasts, weather data, grid demand, and wholesale electricity prices. All of this feeds into a model that runs predictions for each individual site. The old system did this in stored procedures. We rebuilt it as a distributed microservice that could handle 100x the load.

The shift here wasn't "microservices because microservices." It was isolation. Forecasting is expensive compute, and it doesn't get to take down fleet control because it's having a bad day.

Real-Time Control

Every battery in the network can be controlled remotely—charge, discharge, or hold. The platform decides automatically, but operators can override when needed. The original system had control logic scattered across 47 different stored procedures. We consolidated it into a single, testable control service.

This was about confidence. If a human operator hits override at 2 AM, you don't want to wonder which stored procedure is going to win the argument.

Market Bidding

Evergen participates in wholesale energy markets, ancillary services, and Virtual Power Plant (VPP) programs. The platform automatically bids and rebids based on market conditions. Getting this wrong means losing money or violating market rules. The compliance requirements alone filled a 200-page document.

This is where "speed" becomes money in the most literal sense. Not in the "our dashboard feels snappy" sense. In the "you missed the price spike, you missed the revenue" sense.

Fleet Monitoring

Thousands of batteries, inverters, and solar panels across NSW, VIC, SA, QLD, and ACT. Each device streams telemetry data. The platform combines vendor APIs with inferred data to build a complete picture of fleet health. The old system processed this synchronously. We rebuilt it with Kafka for real-time streaming.

Synchronous processing here was a silent killer. It doesn't fail loudly. It just backs up, gets slower, drops things, and you wake up to "why didn't we see this battery go offline yesterday?"

Anomaly Detection

The AI detects when devices go offline, underperform, or behave unexpectedly. It alerts operators before customers notice problems. This directly impacts ROI—a dead battery earns nothing. We had to extract this logic from stored procedures that mixed alerting with billing with reporting.

Mixing concerns like that is how systems rot. Not because it's "unclean." Because it means you can't change one thing without side effects you don't understand until production.

Behind-the-Meter Optimization

For residential customers, the ML model creates personalized energy plans. It accounts for each site's weather, load patterns, electricity rates, and VPP participation. Every 5 minutes, it reassesses and adjusts. Customers using Evergen save an additional 26% on electricity bills compared to standard solar+battery setups. We had to ensure the new system matched the old system's optimization quality—any regression would show up in customer bills.

This one made me paranoid, in a good way. If you're wrong, you don't just get a few complaints. You lose trust. And in consumer energy, trust is the whole game.

Front-of-Meter Optimization

For utility-scale assets and commercial sites, the platform maximizes returns across multiple revenue streams: wholesale markets, frequency control, demand response, and network support programs. These markets have different rules, different settlement periods, and different penalties for non-compliance. The original system handled this with 89 stored procedures that nobody fully understood.

This is where "just rewrite it" dies. These markets aren't forgiving, and neither are auditors. You need determinism, traceability, and the ability to explain decisions.

All of this was buried in undocumented SQL. We had to understand it well enough to rebuild it, then prove the new system produced identical results before we could switch over.

The API Ecosystem We Built

The old system was a monolith. The new system integrates with dozens of third-party services through a comprehensive API ecosystem:

External API

Gateway for third-party integrations. Energy retailers, grid operators, and partner platforms connect here.

Mobile API

Powers customer-facing apps. Real-time data, device control, VPP participation—all from a phone.

OEM APIs

Tesla Powerwall, LG RESU, Enphase, Fronius. Each vendor has different protocols. We normalized them all.

Device Onboarding API

What used to take manual database edits now happens through a clean API call.

Device Telemetry API

Real-time data from every asset. Battery state-of-charge, solar output, grid import/export—streaming constantly.

Forecasting API

Predictions for each device. Feeds into the optimization engine for proactive load balancing.

From a fragile monolith with business logic buried in stored procedures to a modern, API-first platform that other companies can build on top of.

Here's the part that's easy to miss: this wasn't "nice to have." It was the only way Evergen could scale beyond "we have 400 customers and we're scared to touch anything." An API-first platform forces boundaries. Boundaries force discipline. Discipline is what lets you ship changes without gambling the business.

Running Two Systems for a Year

Australia's energy market changes prices every 5 minutes. There's no maintenance window. There's no "we'll be down for a few hours."

So we ran two complete production systems simultaneously. For a year.

Component Old (Azure) New (AWS)
Language .NET Go
Database MSSQL + stored procedures MongoDB + Time Series, AWS Athena, Kafka
Scaling Buy bigger servers Add more nodes
Deployment Manual and scary CI/CD with instant rollback

Three engineers maintained the dying system. Everyone else built the replacement. Every feature got built twice. Every data flow got validated across both systems.

Expensive? Yes. But cheaper than failure.

The new AWS infrastructure
The new AWS infrastructure: 34 workloads, ~180 pods, processing hundreds of thousands of optimizations.

The Friday Cutover

We broke the "never deploy on Friday" rule.

Everyone in one room. Azure console on the big screen. Rollback ready.

CTO Nick McGrath gave the signal. We killed Azure.

Thirty minutes of staring at monitoring dashboards. Hundreds of thousands of optimizations flowing through AWS.

It worked. Zero customer impact.

Real-time monitoring
Real-time monitoring: the moment we knew it worked.

Why Speed Actually Matters Here

One customer made $50 in a single day from optimized energy trading. More than most people earn in a month from solar panels.

Here's why: Australia's electricity prices change every 5 minutes. Evergen's AI predicts price spikes and sells stored battery energy at 5-10x retail rates.

Old system processed in 20 seconds. That's 20% of your 5-minute window gone. Sometimes you'd miss the spike entirely.

New system? 50 milliseconds. You catch everything.

Nick McGrath, CTO of Evergen
"When Jack's team showed up, we were drowning. Technical debt everywhere. Bleeding money. They dug into our undocumented mess, figured out the AI, and built something that actually worked. During the midnight Azure crisis—when we were both trying to save the infrastructure—that's when I knew these were real partners, not contractors watching the clock. Today we're running hundreds of thousands of optimizations in real time. That was literally impossible before."
— Nick McGrath, CTO, Evergen
Ben Hutt, CEO of Evergen
"When Ben Burns and I came in, we knew something was wrong. We just didn't know how bad. Losing $30 on every customer in a growth market—that's not a tech problem, that's an existential problem. We called MadAppGang because we needed people who could move fast and wouldn't flinch at the mess. Jack's team didn't just fix the tech—they fixed the business model. $50 to $0.60 per client. That's the difference between a cautionary tale and Australia's top energy platform."
— Ben Hutt, CEO, Evergen

What Made This Hard

Most firms would've walked away. Here's what they'd have seen:

We jumped on the sinking ship and rebuilt it while it was sinking.

The Business After

Before (blocked by infrastructure):

After (infrastructure out of the way):

What We Learned

Infrastructure debt compounds. When you're losing $30 per customer in a $20/customer business, every day costs $12,000. Every week is $84,000. Every month is $360,000. The question isn't whether you can afford to fix it. It's whether you can afford not to.

Cloud support matters more than features. Pricing and feature lists look similar. What's different: how fast they respond when you're dying at 2 AM, whether scaling requires two weeks of approval, whether they treat you like a partner or an invoice.

Parallel infrastructure is expensive but necessary. For 24/7 businesses, it's the only way to migrate without betting everything on a single cutover.

When This Matters to You

You need someone like us when:

But the migration was just the beginning.

We didn't just rescue the infrastructure. We built one of the world's most advanced energy orchestration platforms—the AI that makes financial decisions every five minutes for tens of thousands of assets. Read that story →

Talk to Us

We've done this before. Not just Evergen—multiple companies where infrastructure was the bottleneck between where they were and where they needed to be.

60 minutes with our CTO. We'll tell you honestly whether we can help. No pitch deck. No "let's schedule a follow-up." Just a conversation about your infrastructure.

What's the infrastructure problem you've been putting off?

Industry: Energy Technology
Duration: 12 months
Team: 8 engineers
Stack: Go, Kafka, AWS, Kubernetes