Scaling a SaaS to $3M/year on the back of a monolith

This was originally written as a (very long) comment to a microservices discussion in a software engineering community.

As founding engineer and then CTO, I helped take a bootstrapped SaaS company in 4 years from around $500k ARR to $3mm+ ARR built on top of a large and complex monolith.

Early days

Hard to quantify exactly when "early days" ends, but for context: 1-3 engineers. Sub $500k ARR. 12-18 months after launch.

Don't even think about microservices at this stage.

If you're thinking how you should structure your app into microservices, you're thinking about the completely wrong thing. The only thing on anyone's mind should be PMF, conversion, retention, growth.

The reality is: your customers don't care about your microservices.

And if your reasoning is "well, if the business grows, we want to be ready to scale by having these services separate!", you're forgetting something crucial: you have to first get the business to that point to even have such a luxury problem.

On top of that, while searching for PMF and iterating on the product based on customer feedback, your codebase will change substantially. Things will be rewritten, boundaries will shift, debt will be introduced.

It is inevitable.

A lot of people may be in for a shock regarding just how much value in the world around us is built on top of ugly tech spaghetti. It's the norm, not the exception. For us, it may have even been hard to qualify the codebase as a cohesive monolith because of just how cobbled together everything was in a single repository. But it got us to $1mm ARR and beyond.

Don't try to apply practices of tech giants to recently-founded software startups. Completely different contexts, an order of magnitude difference in history, and several orders of magnitude difference in revenue and team size.

Starting to mature

Somewhere around $1mm ARR? Maybe 3-5, but less than 10 engineers. Growth is quite steady, conversion and retention are attaining predictability.

At this point, the question should be: can the revenue still 2x and the team 2x without changing the architecture? In most cases the answer will be yes, even if you don't like it. But the yes will be nuanced.

We already saw, based on recent experiences, that adding anything new to the codebase is problematic. There are performance issues and it's very easy to introduce bugs.

The tech focus here was:

A deep test suite covering feature integrations, not units.
Short phases of refactoring to tackle the most problematic (unintuitive, deeply-coupled) parts of the code. Not spending 4-6 months on rewriting code.
Targeted / constrained code rewrites and comments to make it more intuitive and easy to extend.
Occasionally optimising performance to support further growth. A lot of this was DB query work more than anything. The "don't optimise prematurely" adage still applies here.
But apart from that? Still frequently shipping value to customers in the monolith.

Alas, at this point we introduced two microservices to power a customer-facing API. We regretted that decision ever since and ended up gradually decommissioning them.

Here's why:

Bugs — It was still quite easy to introduce bugs into the microservices. The platform we had built was complex, with a lot of features and often subtle cross-feature interactions. Although the API allowed writes, even the read part of the microservices kept having bugs introduced.
Performance — Any performance and scaling gains were negligible. Having the monolith on a separate API server, with a bit of added API-specific code, would have accomplished the same thing with much less pain.
Overhead — There was the engineering and management overhead of having to take into account unique parts of the infrastructure. They took up a disproportionate amount of time relative to the value they provided.
Duplication — There was a lot of app logic duplicated. Packaging up the relevant code for re-use between monolith and microservices would have been a Herculean task by itself, and then we would've had a third layer of overhead.

My guess is that we probably lost out on an additional $500k in ARR by the end of this phase because of the decision to use microservices for the API, based on engineering time devoted to tech work / toil, rather than shipping more product improvements.

What about now?

The team keeps growing, the revenue keeps growing, complexity keeps growing, including entirely new product lines alongside the main platform.

The monolith is still the focus.

Here is what's changed:

Infrastructure as code — powering hundreds of K8s pods. This is related to the asynchronous, background data processing inherent to our platform more than anything. The majority are running the monolith.
Two valuable microservices exist — but only because they are responsible for very clearly distinct parts of the platform (90%? decoupled). Fun fact: they've each already had substantial rewrites. Microservices won't automatically escape the need to change.
The monolith has evolved — to loosely resemble something like a large collection of microservices inside one repository, in a way. But the benefit is that there's no infrastructure and conceptual overhead like there would be if that decoupling happened using formal microservices.
Screaming architecture — Time was spent to make the code a lot more intuitive / grokkable. The folders and classes relate much more to product and customer value rather than tech decisions.
Test suite — The test suite has over 5000 feature tests and growing. We actually dedicated time on multiple occasions to improving the speed of the test suite to help development and deployment velocity.

Could this monolith-centric path continue and eventually support a team of 50 engineers? Maybe, but unlikely.

Nevertheless, there's still a lot of mileage left in making the monolith easy to extend and maintain before diminishing returns kick in.