4 minutes
Breaking the Monolith Routing Traffic
Usually, when a company starts it does with a very simple backend. Since the time to market factor is key is good to not spend too much time doing the perfect solution but having an MVP ready as soon as possible.
If the business goes well sooner or later we’ll get to that point where we are not comfortable with the solution we implemented for that MVP. Is there where we’ll have to decide if it’s better to start a new project from the scratch or refactor the current solution and start breaking the monolith.
In the project where I’m working on, we are exactly at that point and we took some important decisions to be ready for the next steps of the company.
Let’s start
The first thing we did was deciding which piece of the monolith we’d like to separate in a new microservice while we were refactoring and improving the actual monolith because at the end this monolith was the one that was paying our salaries so there was no option to start all from the scratch.
We identified that a good candidate would be a service in charge of users and authentication, which wouldn’t be the service with more traffic in the company but a key one that made total sense having it as a separate one.
Let’s do it with Golang
We chose Golang as a language because we really like this language and some of us had already experience with it. Also, the team was meanly experienced with PHP and we felt that the transition from PHP to Golang is pretty good.
The challenge
To have a nice transaction we prepared a plan with several phases which would allow us to migrate all the traffic to the new service without any downtime. Bear in mind that we had web, iOS, and Android clients. For the last two, we’ll have to deal with old versions which we can’t change. Doing a force update for them was out of our plan for now.
Phase 1 - Testing with specific users
We first prepared the new service with all the functionalities we believed needed to be in there. We created the first series of endpoints ready to accept the traffic of the old endpoints that were handling users.
For the first round of tests, we prepared a proxy in the legacy service with a feature flag system. This allowed us to activate the routing to the new service for specific users so we managed to start testing the new service in production without affecting all the users. Also, this allowed us to turn off the redirection in case something went wrong.
We had two DB’s to deal with, the old and the new ones. We prepared an organic migration from the new service that was importing data when that was needed. This was important to not deal with two sources of truth so the only needed data was getting to the new DB.
Phase 2 - Open to everybody but with fallback button ready
Once we felt confident enough we activated the feature flag for all the users. That was a temporary solution since we were going first to the legacy service and after that going and coming back to the new this was adding noticeable latency but with it, we had the possibility to turn off the proxy in case something failed.
When everything looked good we run the data migration to the new DB so at that point it was better to not go back because new data was all in the new DB so we might lose data if we turned off the proxy.
Phase 3 - Gateway redirection
After that, we tried to somehow redirect traffic for some specific endpoints to the new service.
We tried with the AWS ALB but wasn’t possible since it didn’t allow path redirection.
Then we tried with Cloudflare load balancer but wasn’t a good solution either since it has a limit of 10 rewrite rules.
So we came to a gateway solution. Some of us already worked with KrakenD so we decided to try it. This allowed us to handle traffic nicely and have more control than before being able to rewrite paths, join paralleled requests… and many more nice features that we’ll use in the future.