-1

My question is similar to this and this, but neither actually answer my question. The former is about data consistency between different services, and the latter is about receiving a message exactly once.


When designing a microservice architecture, it's important that each service manages its own data and that each service is independently scalable of every over service.

However, how should we approach handling the scaling of the data persistence that goes with each instance of this service? The way I see it, there are two options:

Option A:

You scale the data persistence layer independently of the service, using sharding or something.

Option A Diagram

This seems sensible at first blow... But a lot of databases can't (to my knowledge at least) be effectively scaled horizontally at runtime without at least a significant degradation to performance while it's happening.

Option B:

Each instance of the service gets its own copy of the persistence DB.

Option B Diagram

If we ignore the increased data replication (since storage is cheap now) the primary issue I see with this is ensuring that the data is consistent between the different instances of the service.

How do people generally approach this problem?

Guru Stron
  • 102,774
  • 10
  • 95
  • 132
ScottishTapWater
  • 3,656
  • 4
  • 38
  • 81

2 Answers2

1

Things like eventual consistency can help with this, but yes, the centralized storage of data can absolutely become the bottleneck. Many cloud-solutions solve this by utilizing large distributed data stores that replicate at the block level instead of the database level, allowing just-in-time replication (dynamodb, firestore, cosmos).

These sorts of solutions are difficult to replicate in on-prem solutions, cassandra and mongo have decent replication options, but yes, scaling a new server definitely has impacts on existing capacity, and requires careful engineering to ensure you have sufficient capacity for your scaling event.

You generally do NOT want to try and set up your own eventual consistency replication. It is possible, but if your current database doesn't support this, and you need it, switch databases. I've done this a time or two in my career (added replication) and each time I have dearly regretted it. There are very good out-of-the-box solutions that you most definitely should be utilizing.

TLDR; ultimately, use the first solution, use a database that scales gracefully and use it's scaling functions.

Rob Conklin
  • 8,806
  • 1
  • 19
  • 23
0

Option A: You scale the data persistence layer independently of the service, using sharding or something

But a lot of databases can't (to my knowledge at least) be effectively scaled horizontally at runtime without at least a significant degradation to performance while it's happening.

There is reason for that - writing system which can be rebalanced during repartition/scaling at runtime is not that easy. There are different strategies utilized by different databases (see Chapter 6 - Partitioning from amazing Designing Data-Intensive Applications book by Martin Kleppmann) to perform rebalancing but either way it will require moving significant amounts of data while maintaining the guarantees that the database have committed to.

Option B: Each instance of the service gets its own copy of the persistence DB.

This is called replication and again can be covered by the database replication mechanism due to many of them supporting leader/follower mechanisms (see the the Chapter 5 - Replication from aforementioned book for a little bit more detailed overview of options and trade-offs), but in general as with first option very rarely you should reinvent the wheel and better try to research the existing tools which will allow you to achieve performance and consistency guarantees your application requires and use them.

And in the end the replication can take you only so far (but again it can be enough for your app) because even with replicated on multiple nodes databases the size still matters, because it is not only the question of storage price but the more data node has the more heavy processing it requires (indexes are bigger size, more data needs to be processed during queries etc).

halfer
  • 19,824
  • 17
  • 99
  • 186
Guru Stron
  • 102,774
  • 10
  • 95
  • 132