48

I know that I can't lock a single mongodb document, in fact there is no way to lock a collection either.

However, I've got this scenario, where I think I need some way to prevent more than one thread (or process, it's not important) from modifying a document. Here's my scenario.

I have a collection that contains object of type A. I have some code that retrieve a document of type A, add an element in an array that is a property of the document (a.arr.add(new Thing()) and then save back the document to mongodb. This code is parallel, multiple threads in my applications can do theses operations and for now there is no way to prevent to threads from doing theses operations in parallel on the same document. This is bad because one of the threads could overwrite the works of the other.

I do use the repository pattern to abstract the access to the mongodb collection, so I only have CRUDs operations at my disposition.

Now that I think about it, maybe it's a limitation of the repository pattern and not a limitation of mongodb that is causing me troubles. Anyway, how can I make this code "thread safe"? I guess there's a well known solution to this problem, but being new to mongodb and the repository pattern, I don't immediately sees it.

Thanks

Mathieu Pagé
  • 10,764
  • 13
  • 48
  • 71

13 Answers13

24

Hey the only way of which I think now is to add an status parameter and use the operation findAndModify(), which enables you to atomically modify a document. It's a bit slower, but should do the trick.

So let's say you add an status attribut and when you retrieve the document change the status from "IDLE" to "PROCESSING". Then you update the document and save it back to the collection updating the status to "IDLE" again.

Code example:

var doc = db.runCommand({
              "findAndModify" : "COLLECTION_NAME",
              "query" : {"_id": "ID_DOCUMENT", "status" : "IDLE"},
              "update" : {"$set" : {"status" : "RUNNING"} }
}).value

Change the COLLECTION_NAME and ID_DOCUMENT to a proper value. By default findAndModify() returns the old value, which means the status value will be still IDLE on the client side. So when you are done with updating just save/update everything again.

The only think you need be be aware is that you can only modify one document at a time.

Hope it helps.

Community
  • 1
  • 1
golja
  • 1,063
  • 7
  • 11
  • You can use simple update() for the same purpose, which is the official solution offered at MongoDB site: http://docs.mongodb.org/manual/tutorial/isolate-sequence-of-operations/ The main complication of this solution though is that code you have to write for the case when update fails. I.e. retry the update. Depending of your code you may have to run into further complications to avoid side effects when retrying, etc. – Yaroslav Stavnichiy Nov 23 '13 at 10:28
  • How does another client wait for the lock to be released? i.e. how can you get notified when `status` changes? – salezica Dec 01 '14 at 21:00
  • What if I want to lock during creation of document object? – Chinni Jun 03 '16 at 16:13
  • 1
    @slezica is right and can you find the solution ? How does another client learn the releasing the locked document ? – akinKaplanoglu Jun 04 '16 at 23:31
  • It's a shame that they haven't expanded findAndModify() to work with multiple documents. – Anatoly Alekseev Apr 02 '18 at 11:06
17

Stumbled into this question while working on mongodb upgrades. Unlike at the time this question was asked, now mongodb supports document level locking out of the box.

From: http://docs.mongodb.org/manual/faq/concurrency/

"How granular are locks in MongoDB?

Changed in version 3.0.

Beginning with version 3.0, MongoDB ships with the WiredTiger storage engine, which uses optimistic concurrency control for most read and write operations. WiredTiger uses only intent locks at the global, database and collection levels. When the storage engine detects conflicts between two operations, one will incur a write conflict causing MongoDB to transparently retry that operation."

Mahesh
  • 611
  • 9
  • 16
8

"Doctor, it hurts when I do this"

"Then don't do that!"

Basically, what you're describing sounds like you've got a serial dependency there -- MongoDB or whatever, your algorithm has a point at which the operation has to be serialized. That will be an inherent bottleneck, and if you absolutely must do it, you'll have to arrange some kind of semaphore to protect it.

So, the place to look is at your algorithm. Can you eliminate that? Could you, for example, handle it with some kind of conflict resolution, like "get record into local' update; store record" so that after the store the new record would be the one gotten on that key?

Charlie Martin
  • 110,348
  • 25
  • 193
  • 263
  • 2
    I Charlie, thanks for answering. I don't understand the conflict resolution you propose. I agree that I do need to change my algorithm and I can imagine some solution, but I feel there must be some agreed upon solution to this problem. It seems to me that it's a classical problem lots of peoples using mongodb (or probably any database) have run into. If it was an in memory update, I would know how to use a mutex to "lock" the variable I want to update so only one thread update it at a time. I guess my question is : How do other programmers usually handle this situation? – Mathieu Pagé Jun 18 '12 at 02:28
  • 2
    Great comment. Don't do it even if it's a job that you MUST do, just because some tool is not good enough. – Anatoly Alekseev Apr 02 '18 at 11:07
  • 2
    **MongoDB finally supports Transactions** :D https://stackoverflow.com/a/53800048/2757916 – Govind Rai Dec 16 '18 at 06:50
8

Classic solution when you want to make something thread-safe is to use locks (mutexes). This is also called pessimistic locking as opposed to optimistic locking described here.

There are scenarios when pessimistic locking is more efficient (more details here). It is also far easier to implement (major difficulty of optimistic locking is recovery from collision).

MongoDB does not provide mechanism for a lock. But this can be easily implemented at application level (i.e. in your code):

  1. Acquire lock
  2. Read document
  3. Modify document
  4. Write document
  5. Release lock

The granularity of the lock can be different: global, collection-specific, record/document-specific. The more specific the lock the less its performance penalty.

Community
  • 1
  • 1
Yaroslav Stavnichiy
  • 20,738
  • 6
  • 52
  • 55
  • How do you wait on the lock? – salezica Dec 01 '14 at 20:59
  • 1
    Acquire lock action typically waits for a lock if it is held by other thread. – Yaroslav Stavnichiy Dec 02 '14 at 21:50
  • 25
    This doesn't work in an application with multiple instances. – rickchristie Jun 22 '15 at 05:23
  • Do any one have coding solution apart from theory? – Naisarg Parmar Sep 21 '21 at 04:41
  • An application with multiple instances can use `findAndModify` to set a global value for the documentID in a separate collection and set the value to `modifying` or `editing`. Then, the application must acquire a lock from the database by setting the document IDs value before proceeding. But can only modify the value if it's `ready` for instance. And then when one instance is done with the document it can change the value back to `ready`. You just have to be careful you'll need an external process which queries the database and resets locks that have a certain age to prevent dead locks. – fIwJlxSzApHEZIl Jul 25 '22 at 16:29
4

Answering my own question because I found a solution while doing research on the Internet.

I think what I need to do is use an Optimistic Concurency Control.

It consist in adding a timestamp, a hash or another unique identifier (I'll used UUIDs) to every documents. The unique identifier must be modified each time the document is modified. before updating the document I'll do something like this (in pseudo-code) :

var oldUUID = doc.uuid;
doc.uuid = new UUID();
BeginTransaction();
if (GetDocUUIDFromDatabase(doc.id) == oldUUID)
{
   SaveToDatabase(doc);
   Commit();
}
else
{
   // Document was modified in the DB since we read it. We can't save our changes.
   RollBack();
   throw new ConcurencyException();
}
Mathieu Pagé
  • 10,764
  • 13
  • 48
  • 71
  • Yup, that's one method of conflict resolution. – Charlie Martin Jun 19 '12 at 03:05
  • You can do that, but using the atomic operators some of the other answers describe is probably what you want (and is atomic like you want). Here are the docs: http://www.mongodb.org/display/DOCS/Atomic+Operations – will Oct 30 '12 at 07:48
  • We have a similar issue, we posted a similar question with a bit different approach. We are still not sure regarding the performance. You can read it up here: https://stackoverflow.com/questions/58609347/synchronize-writes-to-db-from-dynamically-scaled-microservices – Slava Shpitalny Oct 30 '19 at 10:28
4

Update: With MongoDB 3.2.2 using WiredTiger Storage implementation as default engine, MongoDB use default locking at document level.It was introduced in version 3.0 but made default in version 3.2.2. Therefore MongoDB now has document level locking.

Satyam
  • 703
  • 6
  • 20
4

As of 4.0, MongoDB supports Transactions for replica sets. Support for sharded clusters will come in MongoDB 4.2. Using transactions, DB updates will be aborted if a conflicting write occurs, solving your issue.

Transactions are much more costly in terms of performance so don't use Transactions as an excuse for poor NoSQL schema design!

Govind Rai
  • 14,406
  • 9
  • 72
  • 83
2

An alternative is to do in place update

for ex:

http://www.mongodb.org/display/DOCS/Updating#comment-41821928

db.users.update( { level: "Sourcerer" }, { '$push' : { 'inventory' : 'magic wand'} }, false, true );

which will push 'magic wand' into all "Sourcerer" user's inventory array. Update to each document/user is atomic.

Prashant Bhate
  • 10,907
  • 7
  • 47
  • 82
2

If you have a system with > 1 servers then you'll need a distributive lock.

I prefer to use Hazelcast.

While saving you can get Hazelcast lock by entity id, fetch and update data, then release a lock.

As an example: https://github.com/azee/template-api/blob/master/template-rest/src/main/java/com/mycompany/template/scheduler/SchedulerJob.java

Just use lock.lock() instead of lock.tryLock()

Here you can see how to configure Hazelcast in your spring context:

https://github.com/azee/template-api/blob/master/template-rest/src/main/resources/webContext.xml

Azee
  • 1,809
  • 17
  • 23
0

Instead of writing the question in another question, I try to answer this one: I wonder if this WiredTiger Storage will handle the problem I pointed out here: Limit inserts in mongodb

Community
  • 1
  • 1
oderfla
  • 1,695
  • 4
  • 24
  • 49
0

If the order of the elements in the array is not important for you then the $push operator should be safe enough to prevent threads from overwriting each others changes.

0

I had a similar problem where I had multiple instances of the same application which would pull data from the database (the order did not matter; all documents had to be updated - efficiently), work on it and write back the results. However, without any locking in place, all instances obviously pulled the same document(s) instead of intelligently distributing their workforce.

I tried to solve it by implementing a lock on application level, which would add an locked-field in the corresponding document when it was currently being edited, so that no other instance of my application would pick the same document and waste time on it by performing the same operation as the other instance(s).

However, when running dozens or more instances of my application, the timespan between reading the document (using find()) and setting the locked-field to true (using update()) where to long and the instances still pulled the same documents from the database, making my idea of speeding up the work using multiple instances pointless.

Here are 3 suggestions that might solve your problem depending on your situation:

  1. Use findAndModify() since the read and write operations are atomic using that function. Theoretically, a document requested by one instance of your application should then appear as locked for the other instances. And when the document is unlocked and visible for other instances again, it is also modified.

  2. If however, you need to do other stuff in between the read find() and write update() operations, you could you use transactions.

  3. Alternatively, if that does not solve your problem, a bit of a cheese solution (which might suffice) is making the application pull documents in large batches and making each instance pick a random document from that batch and work on it. Obvisously this shady solution is based on the fact that coincidence will not punish your application's efficieny.

Max
  • 63
  • 5
-1

Sounds like you want to use MongoDB's atomic operators: http://www.mongodb.org/display/DOCS/Atomic+Operations

will
  • 667
  • 4
  • 6
  • The problem with the atomic operators is that they don't really help me since I was using the repository patterns, so I only had CRUD operations at my disposition. – Mathieu Pagé Oct 30 '12 at 12:43