5

I have a monolithic application that currently uses a PostgreSQL DB and the schemas are set up as you would expect for most relational databases with various table data being linked back to the user via FKs on the user_id.

I'm trying to learn more about microservices am trying to migrate my python API to a microservice architecture. I have a reasonable understanding of how I'm going to break up the larger app into smaller parts, however, I'm not entirely clear on how I'm supposed to deal with the data side of things.

I understand that one single large DB is against general design principles of microservices but I'm not clear on what the alternative would be.

My biggest concern is cascading across individual databases that would hold microservice data. In a simple rdb, I can just cascade on delete and the DB will handle the work across the various tables. In the case of microservices, how would that work? Would I need to have a separate service that handles deleting user data across the other service DBs?

I don't really understand how I would migrate a traditional application with a relational DB to a microservice architecture?

EDIT:

To clarify - a specific architectural/design problem I'm facing is as follows:

I have split up my application into a few microservices. The ones that are in my mind still relational are:

Geolocation - A service that checks geometry data, records in PostGIS, and returns certain information. A primary purpose is to record the location of a particular user for referencing later

Image - A simple upload service to upload images and store meta data in the db.

Load-Image - A simple service that returns a random set of images based on parameters such as location, and user profile data such as Age, Gender, etc

Profile - A service that simply manages user data such as Age, Gender, etc

Normally, these three items would have a table each in a larger db rather than their own individual dbs. Filtering images by say location and age is a very simple JOIN and filter.

How would something like that work in a microservice architecture? If the data is held in different dbs entirely how would I setup the logic to filter the data? I could duplicate data that doesn't change often like profile info and add it to a MongoDB document that would contain image data including user_id and profile data - however, location data can change regularly and constant updates doesn't sound practical.

What would be the best approach? Or should I stick with a shared RDBMS for just those few services?

maestro416
  • 904
  • 2
  • 17
  • 31
  • How is this not addressed by any good presentation/definition of "microservice" or many SO questions that are just variants of this? Any time you have multiple microservices that must meet a constraint they are collectively part of an implementation of a larger non-microservice & you need non-microservice architecture. This question is a faq, although mostly asked using particular architectures or frameworks. But if you don't specify a framework & other details then it's not much different than asking how to program a software system or write a DBMS. Such a question is "too broad". – philipxy Apr 12 '20 at 22:38
  • Possible duplicate of [Microservices with shared database? using multiple ORM's?](https://stackoverflow.com/q/43612866/3404097) – philipxy Apr 12 '20 at 22:48
  • Possible duplicate of [DB design for microservice architecture](https://stackoverflow.com/q/43426699/3404097) – philipxy Apr 12 '20 at 22:50
  • Please do not insert "EDIT"s/"UPDATE"s, edit your post to be the best presentation possible. Also you added about 19 question marks. Ask 1 specific question. Adding question marks in presenting your question just makes it unclear what you actually want answered. This is still generic in terms of technology, you just added your application topic, and it is still both too broad & a duplicate. Adding all those questions just reflects that the question was & is too broad. [ask] [help] What form do you even expect an acceptable answer to take? (Rhetorical.) – philipxy Apr 13 '20 at 09:27

2 Answers2

7

It comes down to the duplication of data, why we want it, and how we manage it.

Early in our careers we were taught about the duplication of data to make it redundant, for example in database replication or backups. We were also taught that data can be modelled in a relational manner, with constraints enforcing the integrity of the model. In fact, the integrity of the model is sacrosanct. Without integrity, how can you have consistency? The answer is that you can't. Kinda.

When you work with distributed systems and service orientation, you do so because you want to minimise interactions thereby reducing coupling between components. However, there is a cost to this. The more distributed your architecture, the less coupling it has, and the more duplication of data will be necessary. This is taken to an extreme with microservices, where effectively the same data may be present in many different places, in varying degrees of consistency.

Instead of being bad, however, in this context data duplication is an essential feature of your system. It is an enabler of an architectural style with many great benefits. Put another way, without duplication of data, you get less distribution, you get more coupling, which makes your system more expensive to build, own, and change.

So, now we understand duplication of data and why we want it, let's move onto how we manage having lots of duplication. Let's try an example:

In a relational database, let's say we have a table called Customers, which contains a customer ID, and customer details, and another table called Orders which contains the order ID, customer ID, and the order details. Let's say we also have an ordering application, which needs to delete all the customer's orders if the customer is deleted for GDPR.

Because we are migrating our system to microservices, we decide to create a service called Customers.

So we create a service with the following operation:

  • DELETE /customers/{customerId} - deletes a customer

We create another service called Orders with the following operations:

  • GET /orders/customers/{customerId} - gets all the orders for a customer
  • DELETE /orders/{orderId} - deletes an order

We build a UX screen for deleting a customer. The UX first calls the orders service to get all the orders for the customer. Then it iterates over the list of orders, calling the orders service to delete the order. Then it calls the customers service to delete the user.

This example is very simplistic, but as you can see, there is no option but to orchestrate the "Delete Customer" operation from the caller, which in this case is the user interface. Of course, what would be a single atomic transaction in a database does not translate to multiple HTTP/s calls, so it is possible that some of the calls may not succeed, leaving the system as a whole in an inconsistent state. In this instance the inconsistency would need to be resolved via some recovery mechanism.

tom redfern
  • 30,562
  • 14
  • 91
  • 126
  • 2
    @maestro416 I'm not sure I agree about the user interface being responsible for orchestrating data consistency across contexts. That's very fragile and not very effective. You'd usually integrate contexts through messaging infrastructure, which enables reliable eventual consistency patterns. E.g. the customers context dispatches a CustomerDeactivated event/message which the Orders context pickup and in turn archives orders. – plalx Sep 18 '20 at 02:50
  • Agree with @plalx and also you could implement a BFF (Backend for Frontend) but that is another complexity layer and will increase the cost, but if your application is big enough it could be worth the investment. – Mazen Elkashef Oct 29 '21 at 03:11
-1

In a microservice architecture, we have both the option, either use database per service or a shared database. There are advantages and disadvantages to both the pattern. Database per service architecture is the best practice but when the monolithic application has lots of function, procedure or database-specific feature on database level then we can use the Shared database approach, I know this is not the best practice if you have time and bandwidth then you should go for database per service. As your concern is cascading over individual databases, you need to remove cascading from the database and implement global transaction handling in your application and execute all cascading related queries from that transaction.

  • 2
    I have to disagree. A shared database will turn your "microservices" into a single system integrated via the database. The use of distributed transactions is absurd and costly in this context. There are a very few absolute rules in microservice design, but "do not use distrbuted transactions" is one of them. – tom redfern Apr 12 '20 at 19:26