Best practices to update asynchronously duplicated data in mongodb

Question

I am weighing the pros and cons of migrating from a relation SQL database to mongodb for a web application. This for performance issues*.

Storing all object dependencies in the object itself permits quick "reads" for displaying data to users. On the other hand, some data are presents in different collections (for example username is in the users collection but also in the comments collection). I am wondering how to deal with updates of this duplicated data in order to also have quick "writes".

I would like to avoid dealing with the updates of these duplications synchronously.

What are your options for asynchronous updates ? Are there good practices ?

My thoughts on this was to rely on a specific mongodb mecanism (which I don't know) or stack the required updates in a memcache/redis/zeromq engine to be played by an independent process asynchronously.

(*) My question is not about how to optimize joins, indexes and sql database engines, other posts on SO deal with that.

Benjamin M · Accepted Answer · 2015-03-29T15:24:56.980

Basically you have 3 options:

Just store the userId and then fetch the user seperately. This way you always get the optimal results in terms of fresh data. But of course it is slower. That's basically what a relational database does. A SQL DB will just have a look at the foreign key and fetch the data by id.
Live with out of date data. Store a duplicate of the username inside the comments. Sometimes this is desired behavior, because this way you can represent the data exactly as it was when it was stored. This means: If John creates a comment and later his username gets updated to Paul, you are still able to see, that is has been created as John. (This is especially useful for i.e. invoices, when you reference a person there and the address changes, than you don't want to update the address of an old invoice)
Update everything that contains a username, when the username gets updated. This isn't bad either, because a username should normally never change. So reads will always be fast, because the name is stored inside the comment. And if the name changes, you've got to update everything where the user is involved. This is a slow task of course, but because it shouldn't happen every minute, it's tolerable.

3.1 You could optimize things: if username changes, this gets stored somewhere and been applied at midnight. This way you can collect multiple name changes and update everything at the same time.

As you can see: NoSQL is about choice. You can do the things that fit your data best. Of course it's always a tradeoff: Slower/faster, more/less code to write, easier/harder to maintain.

Summarized it's:

Fast writes, consistent data, slow reads
Fast writes, incosistent data, fast reads
Fast writes, fast reads, data gets consistent after the update process which may take some time. And the update process of course is slow.

Thanks, it gives a complete picture of how to turn foreign key into a nosql paradigm. I have already selected to go for option 3. But I am still in the fog regarding how you do these "applied at midnight". Which components / architecture do you use to cron this updates ? — kheraud, Mar 30 '15 at 09:01
There's nothing special involved. Create a new collection, and just put in there the operations you'd like to do later. Something like `{ entity: 'USER', entityId: 42, operation: 'UPDATE_USERNAME', newValue: 'Mike', createdAt: Date(2015-04-01T00:00:00.000Z) }`. Then at midnight, you fetch all entries with `createdAt < now()` and run the operations. Running the operation is of course a task that is driven by your application logic. — Benjamin M, Mar 30 '15 at 09:17
Thanks. I whished a mongodb specific tool permit to do that based on a specific configuration — kheraud, Apr 02 '15 at 16:06
There are tools that can handle parts of it automatically. For example using Spring Data MongoDB. There you can have your document in MongoDB with a corresponding Java Class. Within that class you can have some `DBRef` references to other documents. Spring Data MongoDB will fetch your document, and then see that there are some references to other documents and then automatically fetch them. (This would be `Option 1.` from my answer). I think other MongoDB libs may have such features, too. — Benjamin M, Apr 02 '15 at 16:25

Charith Niroshan Wijebandara · Answer 2 · 2023-06-06T09:11:20.650

You can use the MongoDB aggregation to overcome the issue of having duplicate data in the collection. It has a lookup operation that allows you to bind data from another collection by referring to a unique Id.

Let's consider an example where there are two collections employees and users,

Employee collection schema

User collection schema

You can replace the user in the employee schema with the id of the user, and use the lookup operation from MongoDB aggregation to map the data from user to employee.

Updated employee collection schema

Lookup for the user from employee

Best practices to update asynchronously duplicated data in mongodb

2 Answers2

Linked