How to address decentralized data management in microservices

Published by Emre Baran on November 11, 2024
How to address decentralized data management in microservices

Transitioning from a monolithic architecture to microservices is an intricate, time-consuming task. It demands both strategic foresight and meticulous execution.

In this 10-part series, we’ll guide you through the most common challenges faced during monolith to microservices migration. Last week we published the first part of our series, on how to decompose your application and define service boundaries. This week, we are going to examine how to better manage your data and provide consistency across your entire application.

We'll publish a new article every Monday, so stay tuned. You can also download the full 10-part series to guide your monolith-to-microservices migration.

Data management and consistency

Unlike monolithic applications—where data is stored in a single, centralized database—microservices typically take a decentralized data management approach. Often, each service will have its own, dedicated database or data store, optimized for its specific requirements. Decentralized data management brings both strengths and challenges with it.

In this blog, we will cover the strengths and challenges you should know before you migrate to a decentralized data storage system. Then, we’ll suggest patterns & techniques you can use to overcome those challenges. Let’s start with the good part.

Strengths of decentralized data management

Strengths of decentralized data management.png

1.Scalability Each microservice can scale independently based on its specific load and performance requirements. Decentralized data architecture allows more efficient resource utilization and better handling of varying traffic patterns across different parts of an app.

2.Flexibility in the tech stack Teams are free to plug-and-play their preferred data storage solutions (e.g., SQL, NoSQL, in-memory databases) so they can find the solution that best fits the service’s needs. This means each team can tailor its data storage strategy for optimal performance of each service.

3.Performance When your team isn’t stuck with one option, they can increase the speed of query execution and data retrieval by tailoring data storage technology to each service’s unique access patterns and data types.

4.Fault isolation If one service encounters an issue, it does not necessarily impact the entire system. This isolation enhances system reliability and makes it easier to manage expectations and maintain uptime. So you get a better resilience against the system failures overall.

Challenges of decentralized data management

Before you embrace decentralized data management, you should be aware that there are difficult challenges to overcome, including increased complexity in development and data integration and issues with integrity and latency.

Challenges of decentralized data management.png

1.Complex data integration Integrating data from multiple decentralized sources can be complex and time-consuming. With different nodes running different storage solutions, data interoperability and compatibility become critical considerations to ensure seamless data exchange.

2.Increased development complexity Managing multiple databases requires sophisticated strategies for data replication, synchronization, and consistency. This can make the system harder to develop, test, and maintain.

3.Latency issues Network communication between microservices can increase latency, especially when microservices need to access data from multiple sources.

4.Increased security risks Decentralized data requires a robust security system that maintains security over multiple nodes. Implementing encryption, access controls, and authentication mechanisms across the system is essential to ensure the safety of your data.

6.Data integrity Maintaining data integrity requires thoughtful planning to ensure that business rules and validations are consistently applied across all decentralized services.

If you want to dive deeper into the potential data headaches, we highly recommend checking Chad Sanderson, CEO at Gable.ai, ex Microsoft. Substack:

“In the traditional on-premise Data Warehouse, an experienced data architect was responsible for defining the source of truth in a monolithic environment. While slow and somewhat clunky to use, it fulfilled the primary role of a data ecosystem. Smart, hard-working data professionals maintained an integration layer to ensure downstream consumers could reliably use a set of vetted, trustworthy data sets. In the world of microservices, however, there is no truth with a capital ‘T.’ Each team is independently responsible for managing their data product which can and often will contain duplicative information. There is nothing that prevents the same data from being defined by multiple microservices in different ways, from being called different names, or from being changed at any time for any reason without the downstream consumers being told about it.”

Patterns and techniques to address data management challenges

While there are no plug-and-play solutions to the above issues, there are patterns and techniques companies have successfully used to mitigate their impact.

Eventual consistency accepts that temporary inconsistencies between services may occur but guarantees that the system will eventually reach a consistent state. By allowing this trade-off the system can achieve higher availability and performance in scenarios where strong consistency is not essential.

The Saga pattern manages distributed transactions through a sequence of local transactions, ensuring eventual consistency. Each service involved in a transaction performs its local transaction and publishes an event to trigger the next step.

Event sourcing captures all changes to an application's state as a series of events. Instead of storing the current state, events are persisted in an event store. Services can subscribe to these events and reconstruct their state by replaying the event stream. Event sourcing ensures data consistency, provides a comprehensive audit trail, and supports eventual consistency.

Domain-driven design (DDD) helps define clear boundaries and responsibilities for each service. DDD, which we covered in our previous blog of the series, ensures that data ownership and consistency are maintained within the service boundaries by better aligning the business domain and the microservices architecture.

Command query responsibility segregation (CQRS) separates the read and write operations of a service into different models, optimizing for different data access patterns and scalability requirements. When read and write operations are separated, the system can more efficiently handle queries and commands, improving the overall system performance and scalability.

Now that we’ve covered the concepts, let’s take a look at how all of that plays out in the real world. As in the previous blog, we’ll dive deeper into a case study. This time, we’ll see how Uber dealt with these challenges.

How Uber ensured consistency and speed across millions of requests

Uber fields millions of simultaneous rides per minute. Each of those rides accesses driver requests, payment services, user drop-off data, and more. So when they scaled their microservices architecture, they faced significant data management and consistency challenges.

So, how did Uber scale without losing control of their data? To ensure data consistency, platform integrity, and provide a seamless user experience while scaling, they had to completely rethink how they stored and accessed data.

Coordinating interactions with the saga pattern

Remember the Saga pattern? Uber chose it to coordinate interactions between services, including user, service, and payment services. So when a user requests a ride, each service performs its local transaction and publishes events to trigger the next step in the saga. If any step fails, compensating actions roll back the previous steps to maintain data consistency.

Capturing all actions with event sourcing

Event sourcing helped Uber capture all changes to their system as a sequence of events. Each service publishes domain events whenever there is a state change, like a ride request, or drop-off. Other services consume these events and update their own state to reflect the event. With event sourcing, the team had a complete history of all actions, which they could use for data auditing, debugging, and analysis.

Maintaining data consistency with DDD

Uber applied DDD principles to ensure data integrity and enforce business rules. They defined clear bounded contexts for each domain, such as user management, ride management, and payment processing. Then, they gave each bounded context its own set of services, data models, and business rules. These strong borders ensured data consistency within the context boundaries.

Using CQRS to scale systems independently

With this new architecture, Uber needed a way to take advantage of the decoupled scaling microservices allow. They chose Command Query Responsibility Segregation (CQRS) which allowed them to optimize data access patterns and separate read and write responsibilities.

This allowed Uber to scale each service independently based on the specific requirements of each operation. As a result, the Uber team was able to drive performance improvements where it mattered and more easily maintain their microservices architecture.

Visibility in a distributed system

With their app distributed across various systems, Uber needed a way to see if their systems were working together as designed. So, they invested in monitoring, logging, and tracing capabilities to gain visibility into their distributed system.

They used:

  • Jaeger for distributed tracing
  • Apache Kafka for event streaming
  • Apache Cassandra for high-performance data storage

These tools helped their engineering team identify and troubleshoot data consistency issues, ensure reliable event delivery, and maintain the overall health of their microservices ecosystem.

Two-pronged plan

Decentralized data management challenges do pose a risk to those who aren’t aware of them. But with proper planning and tools, they can be handled, which is exactly what Uber did. They applied successful patterns and techniques and backed them up with robust monitoring and data management technologies.

With those techniques in place, Uber maintained data consistency and integrity while scaling their large-scale microservices architecture, enabling them to deliver a reliable ride-hailing experience.

Looking ahead

Next week we’ll publish the next post of the series, on establishing inter-service communication patterns when migrating to microservices.

Or, you can download the complete 10-part series in one e-book now: "Monolith to microservices migration: 10 critical challenges to consider"

Book a free Policy Workshop to discuss your requirements and get your first policy written by the Cerbos team