- Aryan Ebrahimpour
- 12 min read
If you've dealt with microservice before, you might have already noticed that besides the tough architectural decicions, the devops and site reliability engineering aspects can also slowly turn into a headache.
In this post I want to talk about approaches one can choose to deal with a microservices project. However, this post is mostly about doing these methods in Gitlab. I understand that using other tools like Github or Azure DevOps, or any other repository management and CI/CD system can make things easier or even more complex.
The two most important topics I want to focus on in this post are:
- Repository Management
- Versioning Strategy
We have different options for each of them: You can go with Monorepos or Polyrepos for hosting your code, and you can pick different versioning methods like Semantic Release, Manual Versioning, Global Versioning, etc. I'll talk about each of them in the next sections.
A holy sacred war is still going in this domain. People have different opinions about choosing monorepos (keeping all your source code in one repository) or polyrepos (keeping the source code of different services and packages in different repositories).
Matt Klein wrote a post about monorepos back in 2019 with the title "Monorepos: Please don’t!". He made a bunch of good points about why choosing the monorepo approach for hosting your source code is not a good idea. Immediately one day after publishing of this post in medium, Adam Jacob wrote a counter post with the title "Monorepo: please do!" on the same publishing platform, arguing that monorepos are actually good and why people might want to use it.
Different companies also have different opinions about these approaches. One of the most famous one of these opinions is Google's. In "Software engineering at Google: Lessons learned from programming over time" by Winters, et al. Google says:
At Google, the vast majority of our source is in a single repository (monorepo) shared among roughly 50,000 engineers. Almost all projects that are owned by Google live there, except large open-source projects like Chromium and Android. That includes public-facing products like Search, Gmail, our advertising products, our Google cloud offerings, as well as the internal infrastructure necessary to support and develop all of those products.
We rely on an in-house-developed centralized VCS called Piper, built to run as a distributed microservice in our production environment. This has allowed us to use Google-standard storage, communication, and Compute as a service technology to provide a globally available VCS storing more than 80 terabytes of content and metadata.
That being said, we all know that there aren't many companies out there with Google's human and financial resources, and in many cases, Google's solutions to their problems doesn't fit really well for many other companies.
No, we don't think the monorepo approach as we've described it is a perfect answer for everyone.
As we're overviewing Gitlab for hosting our code, this can hugely affect the decision we want to make and approach we want to use.
In my experience, GitLab is still not there, and using monorepos with GitLab can make your life really harder. Even GitLab admits that using monorepos with microservices with multiple dependencies that must be coordinated is not a good choice.
This might be a bit different with other services like Azure DevOps, as they let you make multiple pipeline definitions for
a single repository, in compared to GitLab where there is a one-to-one mapping between a GitLab repository and it's pipeline defined in
Pros of Polyrepos with GitLab
1. Gitlab's recommendation
Gitlab says do not use monorepos for microservice apps with multiple dependencies. This is enough for many people to just give up immediately.
changes rule is a PITA
rules:changes is a mechanism to specify when to add a job to a pipeline by checking for changes to specific files.
In monorepo's case, you have to use this mechanism for a better performance, because you defenitly don't want to
build all the packages and services in your monorepo with a single commit.
The downside of it however, is that if your commits triggers the pipeline and it fails inside of your PR/MR, the other commits you make in
that MR won't trigger the failed jobs and only triggers the jobs based on the file locations of the new commits.
Fixing this is a touch task that I don't want to get into it in this post.
In polyrepos however, we don’t need to use
changes rule for performance boost, and that's a win.
Larger the project becomes in size in monorepos, you need to checkout so many code you might never touch.
4. Ownership and governance
Using polyrepos enables you to have different ownership, permissions and repo management policies for every repository of polyrepos. If you want the same policy for all repos, that's also an easy task as you can set them group-level or instance-level.
5. Different flexible pipelines for each project
You will have n CI pipelines for n polyrepos in Gitlab, instead of 1 pipeline for a single monorepo, letting you to run standalone pipelines in parallel.
6. Enabling Semantic Release
Even though you can have Semantic Release (SemRel) with some hacks and tricks in monorepos, it's usually not a good idea as SemRel wasn't meant to be used with monorepos. I'll talk more about this in the Versioning approaches section.
Pros of Monorepos with GitLab
You can observe every package and service by cloning only 1 repo.
2. Scattered changes and commits
Monorepos enables you to make changes in many packages and services in one commit or a single PR. Someone might argue that "This looks more like a con to me rather than a pro."
In this case, you have to keep track of changes and dependency graph of the project and build them in order, which can make things complicated.
3. Find and Replace
You can find and replace the entire project using simple tools like VSCode find & replace, however this becomes harder if project grows hugely in size.
4. Enabling Global Versioning
Monorepos are easier to apply a global version to all the packages and services, as the whole source code is in one place.
Just like repository management, versioning also have different approaches that different people prefer different methods over one anohter. Some of this approaches are:
1. Semantic Release Versioning
Semantic Release versioning is a fully automated versioning method. Semantic Release automates the whole package release workflow including: determining the next version number, generating the release notes, and publishing the package. It analysis the commit messages which are usually in a well known format like Conventional Commits, and then decides the next version.
If you hold ceremonies when you increase one major version, you might not want to let a tool to decide for you when your next release event is. But in any other case, I personally don't see why you might not want an automated versioning approach.
2. Manual Versioning
In manual versioning, the developer and maintainers are in charge of changing the versions. It's either fully manual, or done with a help of tool to decide the next version, but either way, it's on the developers to choose the next version.
3. Global Versioning
In global versioning, all of the packages and services have the same version. One change in any of them increases the version of all other packages and services in the codebase. This approach has many benefits and is also very popular, but can be a little bit challanging. Also, if you decide to fork a 3rd party project and make a part of your project, how will be the versioning strategy of that forked repo?
Now that we know what approaches we're dealing with, we want to see how they'll fit with our GitLab instance. We'll review every repository management approach with every versioning approach.
- Semantic Release Versioning
- Manual Versioning
- Global Versioning
- Semantic Release Versioning
- Manual Versioning
- Global Versioning
Monorepos + SemRel
SemRel by itself doesn't fit the monorepo approach, but there is a project called semantic-release-monorepo
that can help us publish the packages and services iteratively. The problem with this approach is that when one package releases
in your pipeline, the other ones have no way to be released and they fail as they are behind the remote branch in the pipeline.
The reason for it is that the first release commited its updated
package.json to the repository.
The scattered changes in different packages can also fail your pipeline as they might be dependent on each other. As we've talked about it already, fixing this needs to build the packages in an order.
We can fix the first issue using alternative projects like dhoulb/multi-semantic-release that helps you publish all the changes in a single job, but the second issue still stands.
To fix the second issue, we need to sort all the packages according to their dependencies
and build them inside the pipeline in order, but in that case, there will be no parallelism possible.
Remember that sorting jobs is not possible in
.gitlab-ci.yml, therefore the only way to make this
happen is to generate a yaml file inside the pipeline and then create a GitLab child pipeline
based on the generated CI file. This can make things ugly and complicated, but works.
Monorepos + Manual Versioning
The downside of this approach is that the chosen version might easily go obsolete depending on how frequently people are making changes in the repository, and developers might forget to change the version. Problems of the iterative releases are still there.
Our first solution doesn't really make a difference for the manual versioning approach:
And our second approach might still fail if people recently merged a PR with the updated version same as the version chose by the poor developer that their PR just merged:
Monorepos + Global Versioning
You already know why the iterative and coordination v1 approach won't work. So let's just try to use the latest approach and calculate/fetch the global version inside the pipeline:
The huge downside of this approach is that you need to version almost everything in the repo with the new version. This is very costly in huge projects or the projects that are slowly becoming huge. Also, calculating or fetching the global version has it's own challanges. The global version source might change while a pipeline is in the middle of it's work.
Second problem can be fixed if commit all the version changes locally using a tool like lerna. But the first problem is still a big deal.
Polyrepos + Semantic Release
In this case, we make a different repository for each package/service. Developers can't make multiple changes in different packages at the same time, and they should wait for one package to be released first.
In this case you should have a central repository named "ci" to define templates for the pipelines of each type of projects. This enables you to make one change in only one repository in order to modify CI/CD of all or a lot of the projects.
Polyrepos + Manual Versioning
The problems with manual versioning are still here.
If you want to go with the Tool-Assisted manual versioning, you might want to make your life easier by setting up husky git hooks to alter the versions locally for you based on your commit messages.
Polyrepos + Global Versioning
Keeping the track of the Global version number is hard to do in polyrepos.
Cut the crap. Which one's better? (opinions)
If you want my opinion, I think there are two aspects that one should consider.
- Developers perspective
- DevOps engineers perspective
As a Developer
I personally hate versioning things manually.
- Specially when there are multiple packages and services to version
- Specially when Micro commits are encouraged
- Specially when version can be calculated from commits in practice, and people are happy with it (I couldn’t find any blog posts or stuff against automated versioning)
However I’m Ok with versioning things manually, only if I’m not going to do it frequently, and I’m not going to do it for every single piece of work I touch. Tool Assisted Global Versioning looks promising in this case, but it’s hard to do so it polyrepos.
As a DevOps Engineer
I want to keep the workflow stupid simple, yet powerful. Not really possible in monorepos, because simple means bare .gitlab-ci, and it’s not powerful enough.
I want the CI system to be my friend, not a PITA (is this too much to ask?). Monorepos aren’t really friendly in Gitlab CI for microservices. Therefore I should either go with Polyrepos, or just to code the entire pipeline using a ci-generator library.
|Way to go
|Higher development speed*
|Monorepo + SemRel
|Higher development speed
Less complicated CI/CD
One version to rule them all
|Monorepo + Global Versioning
(Auto or Tool Assisted)
|Slower dev speed
|Polyrepo + SemRel
|Slower Dev Speed
Slightly complicated CI/CD
One version to rule them all
|Polyrepo + Tool Assisted Global Versioning
|Bare Minimum CI / Simple Stupid
|Polyrepo + Manual Version
* by Higher Dev Speed I mean: Making changes in multiple codebases at the same time