How can I keep my parallel application across multiple servers from grabbing the same mongodb document for work?

Question

So the question is long but pretty self explanatory. I have an app that runs on multiple servers that uses parallel looping to handle objects coming out of a MongoDB Collection. Since MongoDB forces me to allow multi read access I cannot stop multiple processes and or servers from grabbing the same document from the collection and duplicating work.

The program is such that the app waits for information to appear, does some work to figure out what to do with it, then deletes it once it's done. What I hope to achieve is that if I could keep documents from being accessed at the same time, knowing that once one has been read it will eventually be deleted, I can speed up my throughput a bit overall by reducing the number of duplicates and allowing the apps to grab things that aren't being worked.

I don't think pessimistic is quite what I'm looking for but maybe I misunderstood the concept. Also if alternative setups are being used to solve the same problem I would love to hear what might be being used.

Thanks!

https://softwareengineering.stackexchange.com/questions/127065/looking-for-a-distributed-locking-pattern / https://redis.io/topics/distlock — mjwills, Jan 07 '19 at 04:27
Take a look on FindOneAndUpdate: https://docs.mongodb.com/manual/reference/method/db.collection.findOneAndUpdate/ Maybe it will suits your need. — Artyom, Jan 07 '19 at 11:41
Reddis is a neat alternative but if moving away from mongo I would probably push to move to an actual database structure over document based database. And FindOneAndUpdate does not work with what I have said. That function does not support locking at the level I need nor do either of the independents(find and update). The problem with them is that they don't tell the database that the doc pulled is in use. In essence, more transactional. — , Jan 11 '19 at 17:10
https://stackoverflow.com/questions/11076272/its-not-possible-to-lock-a-mongodb-document-what-if-i-need-to has an answer explaining how to manually implement "locking" using application code -- that might work for you — klhr, Feb 15 '19 at 15:00
I appreciate the response. One difference between that post and this is that I have multiple servers running the service that can't know what each of the other servers are working on. I have implemented a date and even a 'working' flag but it doesn't seem to work fast enough for the other servers to not find it before it finishes the update. Since you linked this and I took another look I will have to look into the difference in findOneAndUpdate and findOneAndModify. I'm not sure that there is an equivalent in the c# driver but I will have to look and see and have some hope. — , Feb 15 '19 at 15:20

score 1 · Answer 1 · answered Feb 18 '19 at 22:56

What I hope to achieve is that if I could keep documents from being accessed at the same time

The simplest way to achieve this is by introducing a dispatch process architecture. Add a dedicated process that just watch for changes then delegate or dispatch the tasks out to multiple workers.

The process could utilise MongoDB ChangeStreams to access real-time data changes on a single collection, a database or an entire deployment. Once it receives a stream/document, just sends to a worker for processing.

This should also reduce multiple workers trying to access the same tasks and have a logic to back-down.

How can I keep my parallel application across multiple servers from grabbing the same mongodb document for work?

1 Answers1