mongo save documents in monotically increasing sequence

Question

I know mongo docs provide a way to simulate auto_increment. http://docs.mongodb.org/manual/tutorial/create-an-auto-incrementing-field/

But it is not concurrency-proof as guaranteed by say MySQL.

Consider the sequence of events:

client 1 obtains an index of 1
client 2 obtains an index of 2
client 2 saves doc with id=2
client 1 saves doc with id=1

In this case, it is possible to save a doc with id less than the current max that is already saved. For MySql, this can never happen since auto increment id is assigned by the server.

How do I prevent this? One way is to do optimistic looping at each client, but for many clients, this will result in heavy contention. Any other better way?

The use case for this is to ensure id is "forward-only". This is important for say a chat room where many messages are posted, and messages are paginated, I do not want new messages to be inserted in a previous page.

Well, at `$inc`is an atomic operation. As far as I read the docs and to the best of my knowledge, as long as you create a unique index *or* don't use upserts, using a sequence is rock solid. — Markus W Mahlberg, Apr 27 '15 at 19:15
a) it is a comment. And comments are meant to `` comment. B) It is a shortcut for your requirement of generating a sequence. The sequence itself is monotonically increasing. That's the inherent meaning of atomic. The only thing which can't be guaranteed is a first come, first serve since there are no locks. But each request will get a unique sequence number provided the mentioned requirements are met. c) Since you want to smarty-pant: Da code, plz! — Markus W Mahlberg, Apr 28 '15 at 05:54

score 0 · Answer 1 · edited May 23 '17 at 12:06

But it is not concurrency-proof as guaranteed by say MySQL.

That depends on the definition of concurrency-proof, but let's see

In this case, it is possible to save a doc with id less than the current max that is already saved.

That is correct, but it depends on the definition of simultaneity and monotonicity. Let's say your code snapshots the state of some other part of the system, then fetches the monotonic key, then performs an insert that may take a while. In that case, this apparently non-monotonic insert might actually be 'more monotonic' in the sense that index 2 was indeed captured at a later time, possibly reflecting a more recent state. In other words: does the time it took to insert really matter?

For MySql, this can never happen since auto increment id is assigned by the server.

That sounds like folklore. Most relational dbs offer fine-grained control over these features, since strict guarantees severely impact concurrency.

MySQL does neither guarantee that there are no gaps, nor that a transaction with a high AUTO_INCREMENT id isn't visible to other readers before a transaction that acquired a lower AUTO_INCREMENT value was committed, unless you keep a table-level lock, which severely impacts concurrency.

For gaplessness, consider a transaction rollback of the first of two concurrent inserts. Does the second insert now get a new id assigned while it's being committed? No - from the InnoDB documentation:

You may see gaps in the sequence of values assigned to the AUTO_INCREMENT column if you roll back transactions that have generated numbers using the counter. (see end of 14.6.5.5.1, "Traditional InnoDB Auto-Increment Locking")

and

In all lock modes (0, 1, and 2), if a transaction that generated auto-increment values rolls back, those auto-increment values are “lost”

also, you're completely ignoring the problem of replication where sequences lead to even more trouble:

Thus, table-level locks held until the end of a statement make INSERT statements using auto-increment safe for use with statement-based replication. However, those locks limit concurrency and scalability when multiple transactions are executing insert statements at the same time. (see 14.6.5.5.2 "Configurable InnoDB Auto-Increment Locking")

The sheer length of the documentation of the InnoDB behavior is a reminder of the true complexity of making apparently simple guarantees in a concurrent system. Yes, monotonicity of inserts is possible with table-level locks, but hardly desirable. If you take a distributed view of the system, things get worse, because we can't even be sure of the counter value in partition mode...

I understood that, and I answered both. You get forward - only behaviour only when using table-level locks, which severely limits concurrency — mnemosyn, Apr 28 '15 at 14:17
What you want sounds more like a realtime message queue (see tailable cursors in MongoDB), not so much like a monotonic field. For a monotonic field, timestamps will do, but clock synchronization matters are non-trivial if you're worried about slow clients, i.e. something in the millisecond regime (see Lamport's paper from '78 http://research.microsoft.com/en-us/um/people/lamport/pubs/time-clocks.pdf). For simple pagination, use a timestamp-based approach and assume that people won't post at unreasonable frequencies (e.g. using ObjectId in MongoDB) — mnemosyn, Apr 28 '15 at 16:58

mongo save documents in monotically increasing sequence

1 Answers1