Some questions about MongoDB joins across databases

Question

I have a bunch of services, each with it's own MongoDB database, All of them are essentially independent since they all have their own database. However, I'm now building another service which use some of the data from this services. In the mongo document I set the ID for the documents in the other database, so I can get the data from that other database. This is a visualization of what I have now:

This way when something changes in a document from Service A, if I get the document from service C I have the same updated values. My question is: is it fine to have such relations or should I bring all collections from the Databases into one Database? Or should I bring the document schema from Service A and B in the document schema for Service C, removing the ID reference?

sounds like you're bordering on the [micro-services](https://microservices.io/) territory. it might pay to look in to a 'message broker' system like [rabbitmq](https://www.rabbitmq.com/). my personal belief is that you shouldn't prematurely build micro-services to avoid complexity and when your app starts getting millions of users/ requests, then look in to building v2.0 of the app with micro-services at which point you should be able to afford a team of developers to do it properly. — Dĵ ΝιΓΞΗΛψΚ, Nov 29 '19 at 15:45
Yes, I'm stepping into the micro-services world. The problem is that on my company we are doing this transition, from monolithic app to microservice, gradually. So what we've done is this: we got some parts of the old application and split it up into some smaller services. Now we are taking away the rest of the old application, which unfortunately contains some aggregates of the other services (for building pages and so on). That's why I'm reusing the same data from other services. By the way thank you for the tip of rabbitmq :) — Jimi, Nov 29 '19 at 16:07

score 1 · Answer 1 · edited Jun 20 '20 at 09:12

1

IMHO reference is always better because as you already said if you change data in one place and when that is is called on upon by different document you will always get the latest/updated one.

This case might differ if you are planning some kind of versioning and you want to store all the changes happening.

Read this MongoDB relationships: embed or reference? for more clear understanding and let me copy a point from there if its TL;DR

Separate data that can be referred to from multiple places into its own collection.

This is not so much a "storage space" issue as it is a "data consistency" issue. If many records will refer to the same data it is more efficient and less error prone to update a single record and keep references to it in other places.

edited Jun 20 '20 at 09:12

Community

1
1

answered Nov 29 '19 at 15:48

Shivam

3,514
2
13
27

Thank you for the answer and the reference link :) I'm splitting up some documents already into separate collections, but what I still haven't figured out is if i should merge existing collections that are in separate databases into one database. So they will still be in their collections and will still be called via reference, but will also be in the same database rather than multiples like it is now – Jimi Nov 29 '19 at 16:10
1

The link, the quote and the whole answer is for different collections within same database. Having multiple databases is a different matter. There is no way to $lookup references on db level so all dereferencing job must be done on application level. On top of that there are no cross-db transaction, the client will have to keep connections pool to each database, etc – Alex Blex Nov 29 '19 at 17:05
I've just found out that with Stitch we can add triggers on collections [https://docs.mongodb.com/stitch/triggers/database-triggers/](https://docs.mongodb.com/stitch/triggers/database-triggers/) This changes everything, since in theory I should be able to add a trigger when some data change and use that to propagate the change to other databases within the same Atlas Cluster – Jimi Dec 02 '19 at 14:53

score 1 · Answer 2 · answered Nov 29 '19 at 17:58

1

If service C needs very small subset of data from Service A database and Service B database then you may consider copying that subset of data in Service C database. This way you will be able to perform aggregation on the database which is not possible with references and single query will do the job. To keep the Service C database up to date you may consider message broker like Kafka. For any update on Services A and B database, kafka messages will be produced and then consumed by Service C.

answered Nov 29 '19 at 17:58

Shard Gupta

31
2

With this approach I don't even need aggregation since everything will be within the same document. By the way this could really be a solution, but I'm afraid it can lead to some data inconsistency when an update fails for some reason – Jimi Dec 02 '19 at 13:26

Some questions about MongoDB joins across databases

2 Answers2