1

I have to create a MySQL InnoDB table using a strictly sequential ID to each element in the table (row). There cannot be any gap in the IDs - each element has to have a different ID and they HAVE TO be sequentially assigned. Concurrent users create data on this table.

I have experienced MySQL "auto-increment" behaviour where if a transaction fails, the PK number is not used, leaving a gap. I have read online complicated solutions that did not convince me and some other that dont really address my problem (Emulate auto-increment in MySQL/InnoDB, Setting manual increment value on synchronized mysql servers)

  • I want to maximise writing concurrency. I cant afford having users writing on the table and waiting long times.
  • I might need to shard the table... but still keeping the ID count.
  • The sequence of the elements in the table is NOT important, but the IDs have to be sequential (ie, if an element is created before another does not need to have a lower ID, but gaps between IDs are not allowed).

The only solution I can think of is to use an additional COUNTER table to keep the count. Then create the element in the table with an empty "ID" (not PK) and then lock the COUNTER table, get the number, write it on the element, increase the number, unlock the table. I think this will work fine but has an obvious bottle neck: during the time of locking nobody is able to write any ID. Also, is a single point of failure if the node holding the table is not available. I could create a "master-master"? replication but I am not sure if this way I take the risk of using an out-of-date ID counter (I have never used replication).

Thanks.

Community
  • 1
  • 1
user1156544
  • 1,725
  • 2
  • 25
  • 51
  • 1
    What is the nature of needing consecutive IDs? is it a user-facing value? Perhaps because you are passing that ID on to the user. if so I would create a surrogate ID value that you pass to the users. This gives you design flexibility later. You say "maximize writing concurrency" is important, but "not waiting a long time" leaves a lot of wiggle room -- very different from "maximize writing concurrency". one is much easier, so which is it? Defining it as "not waiting a long time" seems to allow you to do some extra, but lightweight work to fulfill your requirements. – gillyspy May 03 '13 at 04:12
  • I wrote up an innodb gap answer [Over Here](http://stackoverflow.com/a/38363271) – Drew Jul 14 '16 at 00:12

1 Answers1

2

I am sorry to say this, but allowing high concurrency to achieve high performance and at the same time asking for a strictly monotone sequence are conflicting requirements.

Either you have a single point of control/failure that issues the IDs and makes sure there are neither duplicates nor is one skipped, or you will have to accept the chance of one or both of these situations.

As you have stated, there are attempts to circumvent this kind of problem, but in the end you will always find that you need to make a tradeoff between speed and correctness, because as soon as you allow concurrency you can run into split-brain situations or race-conditions.

Maybe a strictly monotone sequence would be ok for each of possibly many servers/databases/tables?

Daniel Schneller
  • 13,728
  • 5
  • 43
  • 72
  • I guess you are right... I know that concurrency + high availability is known to be an impossibility. The main requirement is that IDs are sequentially assigned (no gaps, no repetitions) for the whole system. Maybe the solution I suggested is the most sensible one... give priority to correctness over speed – user1156544 May 02 '13 at 22:54
  • high-availability is not what you think it means.... at least not according to your comment. high concurrency and guaranteed uniqueness are available relatively easily, but guaranteed consecutiveness is not a common feature for *andy* rdbms. I know Oracle,DB2, Sybase, MySQL, MSSQLServer. None of these guarantee consecutiveness. High Availability is somehing else. – rolfl May 02 '13 at 23:05
  • I meant that strong consistency and high data availability cannot be achieved simultaneously... either I give priority to consistency (obtaining always the right ID but locking other accesses while doing so) or data availability (obtaining always an ID even though is not the "correct" one). In my system correctness of the ID is more important... but I was wondering if there was any other "trick" to improve writing performance – user1156544 May 03 '13 at 16:49