Since you only need one field from the table, maybe you could try something simpler than the referenced issue suggests. You don't really need a set of statements, you need an update on one table, which would be able to return a value from the row that it updated. So, this could actually work:
CREATE TABLE codes (code CHAR(16) PRIMARY KEY, used BOOL DEFAULT 0, INDEX(used));
CREATE TRIGGER pick_code AFTER UPDATE ON codes FOR EACH ROW SET @your_code = OLD.code;
--populate the table
Now, every user in their connection runs
UPDATE codes SET used = 1 WHERE used = 0 LIMIT 1;
And then this should return the chosen code:
SELECT @your_code;
It's atomic, so you don't need a transaction for that, or explicit table locking. Whether to make the table InnoDB
and MyISAM
should be decided empirically, based on comparative performance in your environment, as it can depend on many things which would be out of scope here.
Notes on integrity
Please note that it's just a stub, not a complete solution. In reality you'll need some more logic to ensure all of your 4 requirements:
- only one user gets a code;
- at least one user gets a code;
- only one code is given to a user;
- at least one code is given to a user.
The stub addresses the last point, the first and second ones are a matter of crash-safety (you should be able to ensure that with proper InnoDB settings, even though in other ways InnoDB will be inferior to MyISAM for this flow), and finally for the 3rd point you also need to store the information that the user has been given a code, but it depends on how your users are identified. E.g. it can be something like
CREATE TABLE codes (code CHAR(16) PRIMARY KEY, used BOOL DEFAULT 0, assignee VARCHAR(128), UNIQUE(assignee), INDEX(used));
CREATE TRIGGER pick_code BEFORE UPDATE ON codes FOR EACH ROW SET @your_code = OLD.code, NEW.assignee = CURRENT_USER();
(just another stub -- it can be done in a completely different way).
Update (notes on having an index for used
column)
Since a question about the index on used
was raised in the comments, I've run a quick informal benchmark. It's based on the solution above, but might also be worth considering with any other solutions which use similar structures and DML.
Disclaimers:
- absolute values in the results are completely irrelevant, the tests were performed on a regular Debian desktop installation, not in any way tuned for benchmarking;
- the results are not intended to prove that the suggested solution is good, only to check some points that were discussed;
- the server was not InnoDB-tuned, one might achieve better performance with InnoDB tables with proper configuration, it's just a very rough comparison.
Test setup
- MySQL server 5.6.34, 64-bit binary tarball from the official website
- options:
--innodb-buffer-pool-size=4G --innodb-flush-log-at-trx-commit=2
, all other options are server defaults;
- client tool:
mysqlslap
from the same package
Four tables are created. The structure is identical, apart from the engine (MyISAM
vs InnoDB
) and an index on used
column (index vs no index).
MySQL [test]> show create table codes_innodb \G
*************************** 1. row ***************************
Table: codes_innodb
Create Table: CREATE TABLE `codes_innodb` (
`code` char(17) NOT NULL,
`used` tinyint(1) DEFAULT '0',
PRIMARY KEY (`code`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
MySQL [test]> show create table codes_innodb_i \G
*************************** 1. row ***************************
Table: codes_innodb_i
Create Table: CREATE TABLE `codes_innodb_i` (
`code` char(17) NOT NULL,
`used` tinyint(1) DEFAULT '0',
PRIMARY KEY (`code`),
KEY `used` (`used`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
MySQL [test]> show create table codes_myisam \G
*************************** 1. row ***************************
Table: codes_myisam
Create Table: CREATE TABLE `codes_myisam` (
`code` char(17) NOT NULL,
`used` tinyint(1) DEFAULT '0',
PRIMARY KEY (`code`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
MySQL [test]> show create table codes_myisam_i \G
*************************** 1. row ***************************
Table: codes_myisam_i
Create Table: CREATE TABLE `codes_myisam_i` (
`code` char(17) NOT NULL DEFAULT '',
`used` tinyint(1) DEFAULT '0',
PRIMARY KEY (`code`),
KEY `used` (`used`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
1 row in set (0.01 sec)
Each table is populated with 50,000,000 rows of identical data (not similar, but actually identical).
Test flow
- Two tests are performed on each table.
- Each test is run with 20 concurrent threads, all performing the same update, 250 times in each thread, 5000 times total:
UPDATE codes_innodb SET used = 1 WHERE used = 0 LIMIT 1
- First test is started when
used=0
for all rows (the "initial" state).
- Second test is started when
used=1
for 1,005,000 rows.
The test measures the total amount of time to execute all queries.
Results (in seconds)
| Table | Test 1 | Test 2 |
|--------------------|----------|----------|
| MyISAM with index | 0.459 | 0.333 |
| MyISAM, no index | 3.425 | 801.383 |
| InnoDB with index | 11.529 | 8.205 |
| InnoDB, no index | 19.646 | 2403.297 |
So, at the beginning results with or without index are comparable, even though with index they are somewhat better.
However, when we have to go deeper into the data, results change essentially. With index, they remain approximately the same (ignore fluctuations on low values), but without index the further inside the data, the longer it takes.
It is quite expected, here is why.
With index, regardless where we are, the UPDATE
still performs only one key read and one rnd read:
MySQL [test]> select used, count(*) from codes_myisam_i group by used;
+------+----------+
| used | count(*) |
+------+----------+
| 0 | 48990000 |
| 1 | 1010000 |
+------+----------+
2 rows in set (12.08 sec)
MySQL [test]> flush status;
Query OK, 0 rows affected (0.00 sec)
MySQL [test]> update codes_myisam_i set used=1 where used=0 limit 1;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0
MySQL [test]> select * from information_schema.session_status where variable_name like 'Handler_read%' and variable_value > 0;
+------------------+----------------+
| VARIABLE_NAME | VARIABLE_VALUE |
+------------------+----------------+
| HANDLER_READ_KEY | 1 |
| HANDLER_READ_RND | 1 |
+------------------+----------------+
2 rows in set (0.00 sec)
But without the index, it performs as many rnd reads as many rows have already been updated:
MySQL [test]> select used, count(*) from codes_myisam group by used;
+------+----------+
| used | count(*) |
+------+----------+
| 0 | 48990000 |
| 1 | 1010000 |
+------+----------+
Query OK, 0 rows affected (0.00 sec)
MySQL [test]> flush status;
Query OK, 0 rows affected (0.00 sec)
MySQL [test]> update codes_myisam set used=1 where used=0 limit 1;
Query OK, 1 row affected (0.09 sec)
Rows matched: 1 Changed: 1 Warnings: 0
MySQL [test]> select * from information_schema.session_status where variable_name like 'Handler_read%' and variable_value > 0;
+-----------------------+----------------+
| VARIABLE_NAME | VARIABLE_VALUE |
+-----------------------+----------------+
| HANDLER_READ_RND_NEXT | 1010001 |
+-----------------------+----------------+
1 row in set (0.00 sec)
Of course, these results are very specific to this particular flow, when we perform numerous one-row updates and have to search for a row every time. So, obviously the penalty on the look-ups exceeds the penalty on updating the index. It would have been totally different if we performed a bulk update:
MySQL [test]> update codes_innodb set used = 1 where used = 0 limit 1000000;
Query OK, 1000000 rows affected (7.80 sec)
Rows matched: 1000000 Changed: 1000000 Warnings: 0
MySQL [test]> update codes_innodb_i set used = 1 where used = 0 limit 1000000;
Query OK, 1000000 rows affected (56.91 sec)
Rows matched: 1000000 Changed: 1000000 Warnings: 0
MySQL [test]> update codes_myisam set used = 1 where used = 0 limit 1000000;
Query OK, 1000000 rows affected (1.21 sec)
Rows matched: 1000000 Changed: 1000000 Warnings: 0
MySQL [test]> update codes_myisam_i set used = 1 where used = 0 limit 1000000;
Query OK, 1000000 rows affected (14.56 sec)
Rows matched: 1000000 Changed: 1000000 Warnings: 0
There, naturally, updating tables with the extra index is many times slower than updating tables without the index. I think that's where the confusion in the comments came from.
Update 2 (notes on using a natural PK vs surrogate PK)
Another objection that was raised in the comments is using the natural primary key, as opposed to surrogate primary key, the concern was that it would affect InnoDB
performance.
Here is a similar quick benchmark regarding this.
Test setup
Same environment and server as in the previous test. Two InnoDB
tables are in use.
First is the same one as before, with the natural PK
:
Table: codes_innodb_i
Create Table: CREATE TABLE `codes_innodb_i` (
`code` char(17) NOT NULL,
`used` tinyint(1) DEFAULT '0',
PRIMARY KEY (`code`),
KEY `used` (`used`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
Another one with a surrogate PK
(and with a unique index on code
, since we still want to ensure it's unique -- in the first table, PK itself makes sure of that):
Table: codes_innodb
Create Table: CREATE TABLE `codes_innodb` (
`code` char(17) NOT NULL,
`used` tinyint(1) DEFAULT '0',
`pk` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`pk`),
UNIQUE KEY `code` (`code`),
KEY `used` (`used`)
) ENGINE=InnoDB AUTO_INCREMENT=50000001 DEFAULT CHARSET=latin1
50,000,000 rows in each table, identical data.
Test flow
- The test starts when
used=0
for all rows;
- 10 consequent test runs are performed on each table.
- Each run is 20 concurrent threads, all performing the same update, 250 times in each thread, 5000 times total:
UPDATE codes_innodb SET used = 1 WHERE used = 0 LIMIT 1
- The table is not updated between the runs, that is, the first one starts with all
used=0
, the 2nd one starts with 5000 used=1
in the table, etc.
Each test measures the total amount of time to execute all queries.
Results (in seconds)
| | Individual results | Avg |
|--------------|--------------------------------------------------------------|--------|
| natural PK | 8.061,6.782,5.712, 5.524,7.854,6.166,6.095,4.911,4.435,4.784 | 6.0324 |
| surrogate PK | 9.659,8.981,8.080,11.257,9.621,6.722,6.457,5.937,6.308,6.624 | 7.9646 |
Even though the natural PK
has shown somewhat better results, since the environment is not tuned, I wouldn't go as far as saying that natural PK
is superior here, it's quite possible that with proper tuning of the server and using a better environment it would change. But we can see that there is no performance drop upon using natural PK
vs surrogate PK
for this workflow. So, it's rather a question of personal preferences.