Transitioning from a monolithic architecture to microservices is an intricate, time-consuming task. It demands both strategic foresight and meticulous execution.
In this 10-part series, we’ll guide you through the most common challenges faced during monolith to microservices migration. Last week we published the second part of our series, on how to better manage your data and provide consistency across your entire application. This week, we are going to examine inter-service communication.
Inter-service communication is an integral part of a successful microservices architecture. Handling all that communication efficiently—and avoiding adding excessive latency—is one of the major challenges of the microservices architecture.
We'll publish a new article every Monday, so stay tuned. You can also download the full 10-part series to guide your monolith-to-microservices migration.
Without seamless communication between microservices, the functionality of the app and the experience of the end-user suffers. To make sure you maintain a good user experience you need to tailor inter-service communication to match the demands of your system and make sure it can handle failure scenarios.
The first step to tailoring communication in your system is finding the right communication patterns.
Synchronous communication patterns, such as REST or gRPC, are simple request-response interactions. A service sends a request and then waits for a response from another service.
When multiple microservices synchronously communicate (like in the diagram below), they end up executing the interactions in series. This means the final response must come after all other steps have finished.
This approach ensures consistency, but can also create a performance bottleneck if not managed properly. It’s also important to note that synchronous communication creates tight coupling between all involved services. This pattern is ideal for scenarios that require immediate feedback, including simple and direct interactions.
But, many microservices interactions require complex interactions between multiple microservices. That requires a more complex communication pattern.
Asynchronous communication patterns involve services interacting with each other without waiting for an immediate response. Common asynchronous communication patterns include message queues and streaming platforms.
When multiple microservices use a queue to asynchronously communicate (like in the above diagram), each is free to leave a message without waiting for an answer. The result is non-linear communication that does not require each service to wait before executing.
This pattern decouples services, enhancing scalability and fault tolerance. So, services can work independently from each other, mitigating potential bottlenecks. It does, however, introduce more complexity than synchronous communications.
Event-driven architectures extend asynchronous communication by focusing on events, which are significant state changes associated with a point in time. Services publish events, then other services consume (or subscribe to) these “event streams” as needed.
When multiple microservices communicate through event-driven architecture (like in the diagram below), each service pulls data from and writes data to a central, shared message queue. This provides a flexible, scalable communication model with loose coupling.
Once you’ve chosen the right communication pattern for your microservices, you need to settle on a communication protocol that fits the pattern.
Protocols define the rules for data exchange and ensure interoperability between services. The most common ones used for inter-service communication are:
Synchronous communication protocols
Asynchronous communication protocols
Event driven protocols
Simply picking the right communication isn’t enough to ensure robust inter-service communication. Without proper fault tolerance, communication between services is a weak point in the system, leading to cascading failures.
The following four strategies will help your engineering team build resilience into your inter-service communication.
Retries is a simple strategy that automatically attempts to resend failed requests after a brief delay. This helps mitigate transient issues, such as temporary network glitches or brief service disruptions.
How it works
When dialed in, retries smooth out temporary disruptions, so users aren’t aware of these small faults. It also frees your team from constantly having to manually intervene with requests by increasing the chance of success for every request.
Circuit breakers monitor the health of services and temporarily degrade or disable communication with services that are experiencing failures. This prevents one service outage from causing a chain reaction of cascading failures throughout the system.
How it works
Excessive pressure can quickly overwhelm a service, leading to cascading faults. Circuit breakers prevent services from being overwhelmed, allowing them to recover by relieving load pressure. And, if they fail, circuit breakers increase the stability of a system by isolating these failures and rerouting traffic.
Timeout settings define the maximum interval a service will wait for a response from another service before considering the request failed. With proper timeout configuration, you can mitigate prolonged delays as services wait too long for a response and the resource exhaustion that comes from that.
How it works
Timeout prevents services from waiting indefinitely for responses, allowing the system to keep running without the drag of open requests piling up. It also helps you identify and handle slow services promptly.
Bulkheads partition a system into isolated sections to prevent failures in one part from affecting the larger system. This is similar to compartmentalization in ship design, where individual sections can be sealed off to contain damage.
How it works
By partitioning your system into well-designed bulkheads, you limit the impact of failures, ensuring the rest of your critical services remain available even when others are experiencing failures.
With the right communication patterns and robust strategies to handle failure systems, even the largest apps, like Spotify, can build in resilience.
When Spotify transitioned to a microservices-based system, they adopted an event-driven architecture using Apache Kafka for inter-service communication. This gave Spotify the ability to build loose coupling, scalability, and fault tolerance into their microservices ecosystem.
Spotify chose Kafka so their services could both publish and consume events asynchronously. This decoupled the services from each other, allowing each to evolve independently so Spotify could scale services as needed. Kafka's fault-tolerant and scalable design ensured that events were reliably delivered and processed, even in the face of failures or high loads.
To simplify integration, Spotify developed their own tooling and frameworks. They established guidelines and best practices for event design, schema evolution, and error handling to ensure consistent and reliable communication across services. This allowed engineering teams to build based on business logic rather than low-level communication details.
Additionally, Spotify implemented advanced patterns like Event Sourcing and CQRS (Command Query Responsibility Segregation) to enhance the resilience and scalability of their system. While event sourcing allowed them to capture all state changes as a sequence of events, providing an audit trail and enabling event replay. CQRS separated read and write operations, optimizing for different access patterns and scalability requirements.
As we see, Spotify successfully transitioned to a microservices-based system that could scale and evolve independently, while maintaining loose coupling and fault tolerance. This approach allowed them to handle their app’s massive scale and complexity – a topic we’re going to touch on a little later in our 10-part series on migrating from a monolith to microservices.
Ready for the next in the series? Continue to "Designing service discovery and load balancing in a microservices architecture".
Or, you can download the complete 10-part series in one e-book now: "Monolith to microservices migration: 10 critical challenges to consider"
Book a free Policy Workshop to discuss your requirements and get your first policy written by the Cerbos team
Join thousands of developers | Features and updates | 1x per month | No spam, just goodies.