8

In replica mode each write operation to any collection in any DB, also writes to the oplog collection.

Now, when writing to multiple DBs in parallel, all these write operations also write to the oplog. My question: do these write operations require locking the oplog ? (I'm using w:1 write concern). If they do, this is kind of similar to having a global lock between all the write operations to all the different DBs, isn't it ?

I'd be happy to get any hints on this.

Baruch Oxman
  • 1,616
  • 14
  • 24

2 Answers2

4

According to the documentation, in replication, when MongoDB writes to a collection on the primary, MongoDB also writes to the primary’s oplog, which is a special collection in the local database. Therefore, MongoDB must lock both the collection’s database and the local database. The mongod must lock both databases at the same time to keep the database consistent and ensure that write operations, even with replication, are “all-or-nothing” operations.

This means that concurrent writing to multiple database in parallel on the primary can result in global locks between all the write operations. This is not applicable to the secondary, as MongoDB does not apply writes serially to secondaries, but instead collects oplog entries in batches and then apply those batches in parallel.

Alex
  • 21,273
  • 10
  • 61
  • 73
  • Hm, I read the docs, but I still can't believe it. Since oplog entries are idempotent and the database is locked on application, there would be actually no need for it... I really don't get it. – Markus W Mahlberg Nov 16 '15 at 22:49
  • 1
    This blog post is also worth a read. The author did encounter this `global lock` when developing a real-time metric system. – Alex Nov 17 '15 at 09:21
  • 1
    Forgot to add the link: link:http://daprlabs.com/blog/blog/2014/04/19/mongodb/. Please note that the blogger works for Microsoft and some of his conclusions appear to be somewhat biased. – Alex Nov 17 '15 at 09:54
3

Disclaimer This is all of the top off my head, so please do not crucify me if I have a mistake. However, please correct me.

Why should they?

  1. Premise: Databases, by definition, are not interconnected
  2. oplog entries are always idempotent
  3. The Oplog is a capped collection, with a guarantee of preserving the insert order

Let's assume true parallelism of queries being applied. So, we have two queries arriving at the very same time and we'd need to decide which one to insert to the oplog first. The first one taking the lock will write first, right? Except, there is a problem. Let's assume the first query is a simple one db.collection.update({_id:"foo"},{$set:{"bar":"baz"}}) while the other query is more complicated and therefor takes longer to evaluate for correctness. So in order to prevent that, a lock had to be taken on arrival and released after the idempotent oplog entry was written.

Here is where I have to rely on my memory

However, queries aren't applied in parallel. Queries are queued and evaluated in order of arrival. The database get's locked upon the application of the queries after they ran through the query optimizer. During that lock the idempotent oplog queries are written to the oplog. Since databases are not interconnected and only one query can be applied to a database at any given time, the lock on the database is sufficient. No two data changing queries can be applied to the same database concurrently anyway, so why should a lock be set on the oplog? Apparently, a lock is take on the local database. However, since a lock is already taken on the data, I do not see the reason why. *scratchingMyHead*

Markus W Mahlberg
  • 19,711
  • 6
  • 65
  • 89
  • 2
    This blog post is also worth a read, it describes the `effectively global lock` that can occur in replica sets in more technical detail: http://daprlabs.com/blog/blog/2014/04/19/mongodb/ – Alex Nov 17 '15 at 09:19
  • @Jaco interesting read, although I can not second his conclusions. However, the double lock issue might be worth investigating. – Markus W Mahlberg Nov 17 '15 at 09:35