How to handle data sync in data replication approach on microservices archetecture?

Question

In this Microsoft's article, they say that one microservice should never call another microservice for joining data. Its understandable - it adds autonomousment. Instead, it should propagate the origin data using pub/sub message broker - Kafka for example, so the data will be kept, partially, ahead of time, in both microservices.

Lets say that we are talking about Orders microservice which is reliable on data comes from Users microservice in the scope of this question.

This architectural approach leads to big issues which I don't understand how to solve:

How to sync changes in user entity which is partially replicated to the Orders microservice's DB? And even more complicated - when not all of the modifications are relevant, but others are: if user changes his first name, this update doesn't relevant for the Orders service, since it doesn't make a use with the name of the user. However, if user upgraded his permissions, it is relevant for the Orders service to perform its logic.
Say our product wants a new feature that when creating order we will ensure that a given user has some specific privileges. These privileges array of objects were not replicated from first of all to the users collection resides in the Order service - we should do a huge collscan in order to populate this specific part of data. In huge data set that can be impossible.
A new microservice has launched. How does it gonna to copy all of the users to its own DB? (related to question 2.)

KDW · Answer 1 · 2023-03-09T14:19:40.247

To my understanding (which makes the answer perhaps a bit opinionated) event driven architecture(s)/event sourcing could be a solution for your questions.

How to sync changes in user entity which is partially replicated to the Orders microservice's DB? And even more complicated - when not all of the modifications are relevant, but others are: if user changes his first name, this update doesn't relevant for the Orders service, since it doesn't make a use with the name of the user. However, if user upgraded his permissions, it is relevant for the Orders service to perform its logic.

What you could do is let your user service fire update events for every change to the user's attributes (in such way you group attributes that make sense as a whole into one event). Every other services interested in such events, registers itself as a listener and responds to the event accordingly. I assume Your order µ-service has a (limited) set of user with properties it need to fulfil its tasks.

Say our product wants a new feature that when creating order we will ensure that a given user has some specific privileges. These privileges array of objects were not replicated from first of all to the users collection resides in the Order service - we should do a huge collscan in order to populate this specific part of data. In huge data set that can be impossible.

The key in this situation is to let your order service rebuild its own datastore with user details that are relevant in the new use case by replaying the entire collection of user events since the beginning of time (of your application) or since a certain snapshot (to save some rebuilding time depending on the amount of events to process). Keep in mind, one needs to carefully think about the amount of time/resources this will take and if it is possible to rebuild the order service's datastore. This will not be feasible is the same datastore of the order service is shared with other services (which is something one should carefully consider before designing an application/service that way).

A new microservice has launched. How does it gonna to copy all of the users to its own DB? (related to question 2.)

As mentioned in the previous answer, by replaying all user events assuming you have an event driven architecture in place.

If you are looking for some additional background information, check this article or this article.

Thanks. But how much events kafka can save? If I have 100m users, times the changed every one made during 10 years. It would weigh terabytes to save this history.. Its sounds just un countable for me. Isn't it? — Raz Buchnik, Mar 09 '23 at 20:07
Kafka is actually not intended to store messages forever allthough it can if configured correctly. Kafka is better suited for delivering messages between producers and consumers. That is what event streaming is about. Event sourcing is different and better solutions than Kafka exist for that purpose. — KDW, Mar 09 '23 at 20:52
Also check this interesting article on SO for a more in-depth discussion: https://stackoverflow.com/questions/17708489/using-kafka-as-a-cqrs-eventstore-good-idea — KDW, Mar 09 '23 at 21:01
Yes but still, even if a new service follows up ALL of the events stored ever in the topic, it actually handles past events that probably not any longer relevant. For example: User created with the name bob, then changed the name to Bob, then changed the name again to bob. The new service would do a wasted work, isn't it? — Raz Buchnik, Mar 10 '23 at 22:22
I agree, there could be a (significant) waste of resources by replaying all events. What could solve this aspect more or less, is replaying events starting at a certain snapshot. You need to be sure you don't miss any relevant information before that snapshot by doing so. This is off course use case dependant. — KDW, Mar 12 '23 at 06:43
@RazBuchnik It depends on the solution you use to implement your event sourcing solution. You could find some inspiration [in this article](https://www.eventstore.com/blog/snapshots-in-event-sourcing) or check the documents of your used solution... — KDW, Mar 13 '23 at 08:39

How to handle data sync in data replication approach on microservices archetecture?

1 Answers1