I'd like to pile on to the excellent discussion below as I still don't understand what is going on. I use the Google Cloud MySQL
innodb_version 5.7.25
innodb_autoinc_lock_mode 1
I upload records into a table keyed by an auto incremented primary key. There are no other unique keys. I'm the only user and I'm single threaded. I notice sporadic gaps in the key similarly to those reported by others on this thread.
To experiment with this stuff, I used 5 CSV files that I uploaded in a batch; I dropped the table and recreated it after every experimental batch run. I uploaded all 5 files in the same order every time. I tried different table structures, different data.
In the manual, https://dev.mysql.com/doc/refman/8.0/en/innodb-auto-increment-handling.html it says about "consecutive" lock mode:
This lock mode ensures that, in the presence of INSERT statements where the number of rows is not known in advance (and where auto-increment numbers are assigned as the statement progresses), all auto-increment values assigned by any “INSERT-like” statement are consecutive, and operations are safe for statement-based replication.
And I agree, records are consecutive within the same upload file. But there are gaps between the uploads. The gaps vary. One table skips 1 key every time, then stops skipping for a while, then starts skipping again. Another table skips 8 records every time, which happens to be how many records I upload in the first CSV file that the table saw.
I couldn't see any rhyme nor reason for the number of records skipped but, curiously, the exact sequence of skipping happened every time I reran my experiment batch (so it IS a computer!)
I read the proposed solutions (set the lock to the "old" 0; don't use auto increment; etc.) These are all fine but I DO want to use auto increment, yet this unpredictability makes me nervous. Does anyone know
- What calculation the engine makes to determine how many keys to "reserve"?
- When does this "reservation" happen, before the upload starts?
- Is load data more prone to "key waste" than, say, a python program?
- Is there a way to influence this reservation with info, table structure?
Many thanks!