Guide to performance and scalability in microservices architectures

Published by Emre Baran on January 09, 2025

Transitioning from a monolithic architecture to microservices is an intricate, time-consuming task. It demands both strategic foresight and meticulous execution.

In this 10-part series, we’ll guide you through the most common challenges faced during monolith to microservices migration. Last week we published the seventh part of our series, on understanding security and access control requirements of microservices environment. This week, we are going to dive into performance and scalability in microservices.

We'll publish a new article every Monday, so stay tuned. You can also download the full 10-part series to guide your monolith-to-microservices migration.

Performance and scalability in microservices brief intro

One of the benefits of a microservices architecture is that it allows companies to scale services independently and choose different storage and systems for each separate service. This opens up new ways to optimize performance and allows companies to scale services depending on each service’s individual workload demands.

Despite this, building a scalable well-performing microservices architecture is a major challenge when transitioning. That’s because microservices are more complex, so performance measurements and service scaling drastically differ from what is expected in a monolith.

The challenges of optimizing a microservices architecture

When transitioning from a monolith, you’re very aware of your performance goals and the methods you can use to achieve them. But when you transition to a microservices architecture, those goals and tools go out the window.

After the transition, your team will have to set new best practices based on the new architecture and find new tools to achieve them. In the following section, we’ll cover some tools and best practices you can use to optimize your microservices architecture.

Service communication and latency

In a microservices architecture, you have to balance granularity and communication speed.

Overly granular architectures require microservices to communicate with many other microservices to answer any specific request. Requested data then needs to “hop” between all these microservices before responding to the client. Each “hop” that the data makes on its way to generating a proper answer adds microseconds (or more) to the communication.

This causes high latency where the communication protocol takes more time than the actual services, which can increase the chance of failures.

If instead, you build with the sole intent of lowering latency, you’ll end up with an overly integrated system that isn’t flexible or scaleable. Instead, you need to balance your need for speed with your need for a scalable architecture. When you understand that compromise you can create goals that are attainable for your team.

Data management and consistency

In principle, each microservice manages its own database. However, in execution, this can lead to challenges in maintaining data consistency across microservices. If there is too much difference between service databases, transactions across services become cross-database transactions. This complexifies communication and impacts performance.

So what is theoretically possible (different databases for different services) becomes more trouble than it’s worth in the real world. That means you have to find the tradeoffs that work best for you to lower the complexity of transactions by using similar databases while making the most of your microservices flexibility.

Scalability

One of the major advantages of microservices is that each can scale—or be scaled— independently. However, managing and orchestrating the scaling of multiple services can get overwhelmingly complex. So most microservice deployments use auto-scaling policies and mechanisms that act based on predefined criteria. The good news is most cloud providers offer this functionality. And, there are also open-source solutions, including Kubernetes and Docker Swarm, available.

To ensure these scaling mechanisms work seamlessly in practice, testing is essential. Here is a good reminder from Sarada V., Director, Enterprise DevOps Coach at Sun Life:

“Testing the scalability of a micro-service is very critical as it ensures that the architecture can handle the increased workload effectively. Different capabilities like: vertical / horizontal scaling should be tested thoroughly to make sure that there is no impact to performance or to overall throughput.”

Deployment and DevOps complexity

Handling one deployment pipeline is difficult enough, but with microservices, each deployment involves managing numerous, independent deployment pipelines.

So while ensuring the role out of multiple pipelines, your team has to ensure compatibility between components and handle the orchestration of services. This increases the complexity of your continuous integration and continuous deployment (CI/CD) processes.

Inter-service dependencies

While it’s theoretically possible to make changes in one microservice without changing others, that’s not always true in execution. Often the web of inter-service dependencies between microservices means a change in one system requires a change in another.

When you run into these dependencies, it requires careful versioning and intricate backward compatibility considerations to move forward successfully.

Each of these issues above can compromise your microservices architecture if not proactively addressed. However, none of them present an intractable problem. For every issue above there is a pattern or product that can help you solve it.

Designing scalable, high-performing microservice architectures

The best way to mitigate the potential problems above is to design a well-balanced microservices architecture, starting with capacity planning.

Designing scalable high performing microservice architectures.png

Capacity planning and auto-scaling

Before moving to a microservice architecture, your team should have a solid understanding of the resources required to meet their expected workload and performance requirements. That requires your team to estimate the number of instances, as well as the CPU, memory, and storage capacity needed by those instances, giving you a base capacity to start with. Once you’ve defined your base, your team will have to define how and when you’ll scale your architectures to deal with increased workload.

Only after you’ve set these guidelines in place can you bring in auto-scaling mechanisms to automatically adjust the number of service instances. Platforms like Kubernetes and cloud services like AWS Auto Scaling or Google Cloud Autoscaler are great tools to help you define scaling policies and automatically scale services.

Service granularity

When you’re building a new microservices architecture, there’s always a temptation to create extremely fine-grained services to increase scalability and flexibility. But this approach quickly hits diminishing returns as fine-grained services start to introduce additional network overhead and latency due to increased inter-service communication (as mentioned previously). Then if you go too far the other way, you end up with a different set of troubles. While maintaining coarse-grained services may reduce network overhead, it also limits scalability and agility.

Finding balance is the key. That takes careful analysis of your business domain, performance requirements, and scalability needs so you can identify the boundaries of each service based on its functionality, data ownership, and scalability characteristics.

Once you understand your needs, you’ll know what compromises make more sense so you can find balance in your architecture.

Caching

Caching data allows communication-heavy microservices to improve their performance. By storing frequently accessed data or the results of computationally expensive operations, it reduces the load on downstream dependencies and improves response times.

In microservices, caching can be done at multiple different levels.

Local caching allows each service to maintain its own cache where it can store the data it frequently accesses. Both Redis and Memcached are suitable for creating local caches.
A distributed cache, which is shared across multiple services, gives all services access to unified cache data. This makes it easier for microservices to horizontally scale by externalizing their local cache, allowing requests to land on any of the copies of the service without loss of context. Tools like Redis Cluster, Hazelcast, or Apache Ignite offer distributed caching.
Caching can also be implemented at the HTTP level by using headers like Cache-Control and ETag to allow clients to cache responses. Here is an interesting opinion on it:

“Caching can reduce the HTTP request-response time to get data from distant servers. Microservices regularly require information from other sources (data repositories, legacy systems, etc.). Real-time calls to these sources may involve latency. Caching helps minimize the number of backend calls made by your application.” - Shares Sumit Bhatnagar, VP of Software Engineering.

Asynchronous communication

When microservices communicate synchronously in a large system, it can lead to bottlenecks, making responses slower than expected.

Asynchronous communication patterns, such as message queues and event-driven architectures, open up those bottlenecks by enabling services to decouple their interactions and operate independently, so producers can send messages to a message broker, and consumers can process those messages at their own pace. This loose coupling allows for better scalability, fault tolerance, and improved performance.

Asynchronous messaging tools like Apache Kafka, RabbitMQ, or AWS SQS allow you to implement asynchronous communication in your architecture.

Database optimization

Microservices architectures usually rely on multiple databases or data storage types, each chosen to complement its service. Because of this, your team can significantly improve the responsiveness and scalability of the entire system by optimizing each of these databases.

Optimizing your caching, as mentioned above, is the first step your team can take to optimize database traffic.

Database sharding can also help you optimize your databases. By partitioning data horizontally across multiple database instances based on a specific key, sharding allows your team to distribute the data load more evenly, improving scalability in your architecture.

Depending on your use case, NoSQL databases, such as MongoDB, Cassandra, or Couchbase can also provide better scalability and performance.

When you take the time to plan the compromises and requirements of microservices architecture, you can mitigate issues around complexity, allowing it to perform and scale to its potential. Let’s look at how Amazon did exactly that.

AWS ensures performance for millions of users

AWS supplies a huge variety of services to an ever-expanding clientele base. Despite this complexity and traffic, their microservices architecture continues to perform at a high level. They used a variety of techniques and technologies to maintain optimal performance and scalability across their microservices.

Balancing speed with flexibility

AWS based their microservices architecture on DDD principles, aligning software design with their underlying business concepts and processes. This allowed them to balance flexibility (fine-grained services) and communication speed (coarse-grained services) in a way that worked for their business needs.

Reduced load and improved response times with proper caching

AWS uses caching extensively to improve performance. They use ElastiCache, which uses Redis and Memcached to provide in-memory caching services, to cache each microservice’s frequently accessed data. This improves response times by reducing database load.

At the API Gateway level, AWS uses an in-house product, Amazon API Gateway Caching. This tool allows them to cache API responses, which reduces the number of requests hitting the backend services, improves overall performance and reduces costs.

Building in loose coupling with asynchronous communication

AWS built their own in-house message queues and streaming platforms to ensure their services could scale and act independently.

Amazon Simple Queue Service (SQS) is their message queueing service. It’s a fully managed service that allows nodes to send messages to SQS queues, so consumers can process those messages asynchronously. This makes scaling services independently easier by decoupling microservices.

Amazon developed Kinesis as a bespoke data streaming service. It uses event-driven architecture to process and analyze data in real time.

Automatic optimization

AWS built a fully managed relational database service, named Amazon Aurora, to maintain high-performance, scalable databases. It optimizes varying workloads on the fly through automatic sharding, read replicas, and serverless options to handle varying workload demands.

For NoSQL workloads, AWS uses Amazon DynamoDB, a highly scalable and performant NoSQL database. DynamoDB offers automatic scaling, in-memory caching, and support for global tables to ensure high-throughput data access with low latency.

Optimizing resource use with capacity planning and auto-scaling

Before implementing their microservice, the AWS team went through an in-depth capacity planning process. Then, they laid out auto-scaling parameters based on their understanding of each service’s needs. With defined scaling parameters in place, they turned to auto-scaling to ensure each service scaled when it should.

Instead of using an off-the-shelf piece of software, they developed Amazon CloudWatch to monitor service metrics and trigger auto-scaling actions based on these predefined rules. They used Amazon Elastic Container Service (ECS) or Amazon Elastic Kubernetes Service (EKS) to deploy newly scaled services. This combination of services allows AWS to define scaling policies so services can automatically scale up or down based on workload.

Looking ahead

On January 13th, 2025, we’ll publish the next post in the series, “How to manage organizational and cultural shifts in your organization when migrating to microservices.”

Or, you can download the complete 10-part series in one e-book now: "Monolith to microservices migration: 10 critical challenges to consider"

Guide

Book a free Policy Workshop to discuss your requirements and get your first policy written by the Cerbos team