0

Suppose we have a table with an auto-increment primary key. I want to load all IDs greater than the last ID I have seen.

SELECT id
FROM mytable
WHERE id > 10;

With the naive approach, I risk skipping IDs:

  • Transaction 1 claims ID 11.
  • Transaction 2 claims ID 12.
  • Transaction 2 commits.
  • I read all IDs >10. I see 12, and next time I will read all IDs >12. I have skipped 11.
  • Transaction 1 commits. For all intents and purposes, ID 11 now exists.

As a solution, I propose to do a double check to ensure that no intermediate IDs are about to be committed:

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;

SELECT COUNT(*)
FROM mytable
WHERE id > 10
AND id <= 12; -- Where 12 was the max seen in the first query

If the count is greater than the number of IDs seen in the first query, then it is definitely possible that additional IDs will be committed in the gaps in that sequence.

The question is: does the reverse hold true? If the count is equal to (or less than) the number of IDs seen in the first query, is it guaranteed that there will be no values in between? Or am I missing some possible scenario where the IDs are being claimed, yet the READ UNCOMMITTED query does not see them yet?

For this question, please disregard:

  • Manual ID insertions.
  • Rewinding the auto-increment counter.
Timo
  • 7,992
  • 4
  • 49
  • 67

2 Answers2

1

Mysql locks the table during auto_increment.

See

https://dev.mysql.com/doc/refman/5.7/en/innodb-auto-increment-handling.html

So that normally that problem doesn't occur, if there is no bug in the Version.

The lock works like a semaphore/critical Section.

nbk
  • 45,398
  • 8
  • 30
  • 47
  • But there may still exist a moment in time where the ID is acquired, the table is unlocked again, yet there is no new row visible to a `READ UNCOMMITTED` query yet. Or does it? If [this question](https://stackoverflow.com/q/24257113/543814) is any indication, it does. :( – Timo Nov 29 '19 at 17:37
  • No, my reference to semaphores is quite accurate. Your autoincrement is save. at the time it has only the number of rows when you ask for it. After millisecond later it an be otherwise. the locking of tables is often debated, because large insert or long transaction can lock the table down. And when this is the case you can check that.in a select statement. besides cpu can resort the mimo queues, that's is why you can't rely on the order of inserts – nbk Nov 29 '19 at 20:56
  • Unfortunately this gives me no guarantees. It's all well and good that the table is locked while the IDs are being acquired. But that says little about what may happen _between_ (A) unlocking the table (with the auto-increment counter incremented) and (B) actually storing the inserted IDs in new rows. In fact, the question I linked in my previous comments seems to indicate that there is a certain time span between A and B where the IDs have been claimed, but no rows are written yet (and thus no such rows are visible even to `READ UNCOMMITTED`). It makes _some_ sense, unlocking before writing. – Timo Dec 09 '19 at 15:54
0

I've been developing with SQL databases since 1992, and I have never seen an occasion where using READ UNCOMMITTED was the right solution to any problem.

I guess you are using MySQL as a kind of queue. That is, you rely on the auto-increment ID as the head of the queue.

I don't think you can do this in the way you describe, because of the fact that the order in which transactions generate their auto-inc ID's is not the same order as they commit.

I suggest you need to have another column called processed or something like that. Then you can query for records you have not yet processed:

SELECT id FROM mytable WHERE processed = false ORDER BY id

This way, the query will always return any records you have not seen yet. If ID 11 is committed after you have already seen ID 12, it will show up the next time you run this query.

Once you have done whatever you are going to do with a record, then:

UPDATE mytable SET processed = true WHERE id = ?

An even better solution, without the need to have a processed column, is to use a message queue to complement the SQL database.

When a client adds a record, they should also post the ID of the record they just inserted into the message queue. It's important that this client post to the message queue after they commit the record, or else a consumer of the message queue could get notified of an ID they can't see yet.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • Thanks for the suggestions Bill! I'm aware of MessageQueues and of the ability to track the processing in the producer. These alternatives notwithstanding, I'm still really hoping to find a direct answer to the question. That answer can go one of two ways, really: if there exists a moment where the ID is claimed yet still not visible to `READ UNCOMMITTED`, then my idea doesn't work, and otherwise it does work. – Timo Nov 29 '19 at 17:35
  • I would not recommend using READ UNCOMMITTED at all. That breaks the isolation property of ACID, and it should not be considered as a proper solution. – Bill Karwin Nov 29 '19 at 17:37
  • Note that the `READ UNCOMMITTED` is only used as a confirmation. The only data returned is the result from the `READ COMMITTED` query, granted that there are no uncommitted IDs being skipped. – Timo Nov 29 '19 at 17:38