Mysql: lock only one row for select

Question

I have a kind of "ticket lottery distribution" problem. Each user that come to me need to get an unique code, taken from a table of available codes. Each user must get one and only one code, and each code must be given to one and only one user. The code that will be provided to the user is the first code available on the table that isn't flagged as "used".

This issue is very similar to this: mysql - Locking rows for select query?

But with a huge difference: I have measured accesses up to 18 users / seconds so far, so I need to lock only one row per user (the one I'm working on for each user). Locking the whole table could be an issue.

I've read that "READ COMMITTED" might be useful: https://www.percona.com/blog/2012/08/28/differences-between-read-committed-and-repeatable-read-transaction-isolation-levels/

but this is the first time I'm doing something like this and I'm a bit lost on how to proceed exactly and hot to test it later simulating this huge load before putting the code into production.

score 2 · Answer 1 · edited Jun 20 '20 at 09:12

Since you only need one field from the table, maybe you could try something simpler than the referenced issue suggests. You don't really need a set of statements, you need an update on one table, which would be able to return a value from the row that it updated. So, this could actually work:

CREATE TABLE codes (code CHAR(16) PRIMARY KEY, used BOOL DEFAULT 0, INDEX(used));
CREATE TRIGGER pick_code AFTER UPDATE ON codes FOR EACH ROW SET @your_code = OLD.code;
--populate the table

Now, every user in their connection runs

UPDATE codes SET used = 1 WHERE used = 0 LIMIT 1;

And then this should return the chosen code:

SELECT @your_code;

It's atomic, so you don't need a transaction for that, or explicit table locking. Whether to make the table InnoDB and MyISAM should be decided empirically, based on comparative performance in your environment, as it can depend on many things which would be out of scope here.

Notes on integrity

Please note that it's just a stub, not a complete solution. In reality you'll need some more logic to ensure all of your 4 requirements:

only one user gets a code;
at least one user gets a code;
only one code is given to a user;
at least one code is given to a user.

The stub addresses the last point, the first and second ones are a matter of crash-safety (you should be able to ensure that with proper InnoDB settings, even though in other ways InnoDB will be inferior to MyISAM for this flow), and finally for the 3rd point you also need to store the information that the user has been given a code, but it depends on how your users are identified. E.g. it can be something like

CREATE TABLE codes (code CHAR(16) PRIMARY KEY, used BOOL DEFAULT 0, assignee VARCHAR(128), UNIQUE(assignee), INDEX(used));
CREATE TRIGGER pick_code BEFORE UPDATE ON codes FOR EACH ROW SET @your_code = OLD.code, NEW.assignee = CURRENT_USER();

(just another stub -- it can be done in a completely different way).

Update (notes on having an index for `used` column)

Since a question about the index on used was raised in the comments, I've run a quick informal benchmark. It's based on the solution above, but might also be worth considering with any other solutions which use similar structures and DML.

Disclaimers:

absolute values in the results are completely irrelevant, the tests were performed on a regular Debian desktop installation, not in any way tuned for benchmarking;
the results are not intended to prove that the suggested solution is good, only to check some points that were discussed;
the server was not InnoDB-tuned, one might achieve better performance with InnoDB tables with proper configuration, it's just a very rough comparison.

Test setup

MySQL server 5.6.34, 64-bit binary tarball from the official website
options: --innodb-buffer-pool-size=4G --innodb-flush-log-at-trx-commit=2, all other options are server defaults;
client tool: mysqlslap from the same package

Four tables are created. The structure is identical, apart from the engine (MyISAM vs InnoDB) and an index on used column (index vs no index).

MySQL [test]> show create table codes_innodb \G
*************************** 1. row ***************************
       Table: codes_innodb
Create Table: CREATE TABLE `codes_innodb` (
  `code` char(17) NOT NULL,
  `used` tinyint(1) DEFAULT '0',
  PRIMARY KEY (`code`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

MySQL [test]> show create table codes_innodb_i \G
*************************** 1. row ***************************
       Table: codes_innodb_i
Create Table: CREATE TABLE `codes_innodb_i` (
  `code` char(17) NOT NULL,
  `used` tinyint(1) DEFAULT '0',
  PRIMARY KEY (`code`),
  KEY `used` (`used`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

MySQL [test]> show create table codes_myisam \G
*************************** 1. row ***************************
       Table: codes_myisam
Create Table: CREATE TABLE `codes_myisam` (
  `code` char(17) NOT NULL,
  `used` tinyint(1) DEFAULT '0',
  PRIMARY KEY (`code`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

MySQL [test]> show create table codes_myisam_i \G
*************************** 1. row ***************************
       Table: codes_myisam_i
Create Table: CREATE TABLE `codes_myisam_i` (
  `code` char(17) NOT NULL DEFAULT '',
  `used` tinyint(1) DEFAULT '0',
  PRIMARY KEY (`code`),
  KEY `used` (`used`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
1 row in set (0.01 sec)

Each table is populated with 50,000,000 rows of identical data (not similar, but actually identical).

Test flow

Two tests are performed on each table.
Each test is run with 20 concurrent threads, all performing the same update, 250 times in each thread, 5000 times total:

UPDATE codes_innodb SET used = 1 WHERE used = 0 LIMIT 1

First test is started when used=0 for all rows (the "initial" state).
Second test is started when used=1 for 1,005,000 rows.

The test measures the total amount of time to execute all queries.

Results (in seconds)

| Table              |  Test 1  |  Test 2  |
|--------------------|----------|----------|
| MyISAM with index  |   0.459  |    0.333 |
| MyISAM, no index   |   3.425  |  801.383 |
| InnoDB with index  |  11.529  |    8.205 |
| InnoDB, no index   |  19.646  | 2403.297 |

So, at the beginning results with or without index are comparable, even though with index they are somewhat better. However, when we have to go deeper into the data, results change essentially. With index, they remain approximately the same (ignore fluctuations on low values), but without index the further inside the data, the longer it takes.

It is quite expected, here is why. With index, regardless where we are, the UPDATE still performs only one key read and one rnd read:

MySQL [test]> select used, count(*) from codes_myisam_i group by used;
+------+----------+
| used | count(*) |
+------+----------+
|    0 | 48990000 |
|    1 |  1010000 |
+------+----------+
2 rows in set (12.08 sec)

MySQL [test]> flush status;
Query OK, 0 rows affected (0.00 sec)

MySQL [test]> update codes_myisam_i set used=1 where used=0 limit 1;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

MySQL [test]> select * from information_schema.session_status where variable_name like 'Handler_read%' and variable_value > 0;
+------------------+----------------+
| VARIABLE_NAME    | VARIABLE_VALUE |
+------------------+----------------+
| HANDLER_READ_KEY | 1              |
| HANDLER_READ_RND | 1              |
+------------------+----------------+
2 rows in set (0.00 sec)

But without the index, it performs as many rnd reads as many rows have already been updated:

MySQL [test]> select used, count(*) from codes_myisam group by used;
+------+----------+
| used | count(*) |
+------+----------+
|    0 | 48990000 |
|    1 |  1010000 |
+------+----------+
Query OK, 0 rows affected (0.00 sec)

MySQL [test]> flush status;
Query OK, 0 rows affected (0.00 sec)

MySQL [test]> update codes_myisam set used=1 where used=0 limit 1;
Query OK, 1 row affected (0.09 sec)
Rows matched: 1  Changed: 1  Warnings: 0

MySQL [test]> select * from information_schema.session_status where variable_name like 'Handler_read%' and variable_value > 0;
+-----------------------+----------------+
| VARIABLE_NAME         | VARIABLE_VALUE |
+-----------------------+----------------+
| HANDLER_READ_RND_NEXT | 1010001        |
+-----------------------+----------------+
1 row in set (0.00 sec)

Of course, these results are very specific to this particular flow, when we perform numerous one-row updates and have to search for a row every time. So, obviously the penalty on the look-ups exceeds the penalty on updating the index. It would have been totally different if we performed a bulk update:

MySQL [test]> update codes_innodb set used = 1 where used = 0 limit 1000000;
Query OK, 1000000 rows affected (7.80 sec)
Rows matched: 1000000  Changed: 1000000  Warnings: 0

MySQL [test]> update codes_innodb_i set used = 1 where used = 0 limit 1000000;
Query OK, 1000000 rows affected (56.91 sec)
Rows matched: 1000000  Changed: 1000000  Warnings: 0

MySQL [test]> update codes_myisam set used = 1 where used = 0 limit 1000000;
Query OK, 1000000 rows affected (1.21 sec)
Rows matched: 1000000  Changed: 1000000  Warnings: 0

MySQL [test]> update codes_myisam_i set used = 1 where used = 0 limit 1000000;
Query OK, 1000000 rows affected (14.56 sec)
Rows matched: 1000000  Changed: 1000000  Warnings: 0

There, naturally, updating tables with the extra index is many times slower than updating tables without the index. I think that's where the confusion in the comments came from.

Update 2 (notes on using a natural PK vs surrogate PK)

Another objection that was raised in the comments is using the natural primary key, as opposed to surrogate primary key, the concern was that it would affect InnoDB performance. Here is a similar quick benchmark regarding this.

Test setup

Same environment and server as in the previous test. Two InnoDB tables are in use.

First is the same one as before, with the natural PK:

       Table: codes_innodb_i
Create Table: CREATE TABLE `codes_innodb_i` (
  `code` char(17) NOT NULL,
  `used` tinyint(1) DEFAULT '0',
  PRIMARY KEY (`code`),
  KEY `used` (`used`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

Another one with a surrogate PK (and with a unique index on code, since we still want to ensure it's unique -- in the first table, PK itself makes sure of that):

       Table: codes_innodb
Create Table: CREATE TABLE `codes_innodb` (
  `code` char(17) NOT NULL,
  `used` tinyint(1) DEFAULT '0',
  `pk` int(11) NOT NULL AUTO_INCREMENT,
  PRIMARY KEY (`pk`),
  UNIQUE KEY `code` (`code`),
  KEY `used` (`used`)
) ENGINE=InnoDB AUTO_INCREMENT=50000001 DEFAULT CHARSET=latin1

50,000,000 rows in each table, identical data.

Test flow

The test starts when used=0 for all rows;
10 consequent test runs are performed on each table.
Each run is 20 concurrent threads, all performing the same update, 250 times in each thread, 5000 times total:

UPDATE codes_innodb SET used = 1 WHERE used = 0 LIMIT 1

The table is not updated between the runs, that is, the first one starts with all used=0, the 2nd one starts with 5000 used=1 in the table, etc.

Each test measures the total amount of time to execute all queries.

Results (in seconds)

|              | Individual results                                           |    Avg |
|--------------|--------------------------------------------------------------|--------|
| natural PK   | 8.061,6.782,5.712, 5.524,7.854,6.166,6.095,4.911,4.435,4.784 | 6.0324 |
| surrogate PK | 9.659,8.981,8.080,11.257,9.621,6.722,6.457,5.937,6.308,6.624 | 7.9646 |

Even though the natural PK has shown somewhat better results, since the environment is not tuned, I wouldn't go as far as saying that natural PK is superior here, it's quite possible that with proper tuning of the server and using a better environment it would change. But we can see that there is no performance drop upon using natural PK vs surrogate PK for this workflow. So, it's rather a question of personal preferences.

Does the update statement locks all the records with used = 0, or just the record selected with Limit 1? — Terix, Nov 29 '16 at 19:25
Incorrect. You need a unique constraint. Also why index a boolean field? It's completely pointless. — N.B., Nov 29 '16 at 21:40
@N.B. Primary key on `code` is the unique constraint. Regarding the index -- after 1000 updates, the 1001st update with an index will still need 1 index lookup, the 1001st update without an index will need 1001 rnd lookups. Still, I agree it needs to be benchmarked whether the index makes it any better or not. I expect it might help rather considerably. — elenst, Nov 29 '16 at 22:24
Indexed boolean fields incur a performance penalty, they really serve no purpose, MySQL will still perform a table scan regardless of this index and it will be faster (you can measure with say, 50 mil. rows and see for yourself). Natural primary key instead of surrogate primary key impacts the write speed (it's related to InnoDB and how it handles writes). Anyway, I wrote a wall of text with my personal take on this problem and the OP has two available answers, I take it we helped so as far as I'm concerned - that's it from me. Have a good day! — N.B., Nov 29 '16 at 22:29
@Terix The update itself, while running, will take enough locks to prevent other updates from running concurrently (locks would be different, depending on the engine in use and indexes on the table). But the point is, it's a fast update and fast lookup. Row locks are sometimes overrated, I think you'll get more overhead while chasing for them. Even [MySQL suggests](https://dev.mysql.com/doc/refman/5.7/en/internal-locking.html) that in your particular situation table locks might be preferable (look for "Generally, table locks are superior", 2nd point) — elenst, Nov 29 '16 at 22:30
@N.B. I've added results of a simple test on a 50M table, as you suggested. If you keep an open mind, you might find them interesting. This addition only addresses the question about indexes, and is not intended to argue that the suggested solution is better than the other one -- quite possibly it is not, at the very least it's obviously not crash-safe (it wasn't in the initial requirements though). — elenst, Nov 30 '16 at 03:26

score 2 · Accepted Answer · answered Nov 29 '16 at 22:16

You don't need to lock rows, ever.

When dealing with uniqueness, you place UNIQUE constraints. Always. No exception, EVER.

The key is not performance but data integrity. That's what databases are for - to contain VALID data so you can create projections, plans, etc.

It's not difficult to make it fast. Do not sacrifice data integrity for performance.

Disclaimer: I did not test any of the SQL here. Everything I posted serves as an educational example. It might work if you copy paste it, I am not guaranteeing for the syntactical correctness of any SQL shown here.

The problem

Each user must get one and only one code, and each code must be given to one and only one user

This defines uniqueness. There's one code. One user can only have one code. Let's design the model.

The solution

We'll save codes into one table
We'll use a junction table to link users with codes. Junction table is used so we don't have to alter users table for this approach to work
Codes will have a unique value (handled by UNIQUE constraint)
Junction table will allow only 1 code per user, but the same user will still be able to have other codes assigned to them, if need occurs later on

Using this approach:

We will be getting errors about duplicate entries
We will handle those errors

Why are we going to get errors? Because we'll hit unique constraints and database will simply deny the entry. But that's what we want and that's what we'll get. Locking the rows is not the solution to these types of problems. It's slower. You need to unlock the row(s) at some point. It's prone to concurrency issues. Database handles uniqueness infinitely times better than you or any other programmer can do. Therefore, we're going to use database's mechanisms.

The codes table

Codes table will contain codes. You didn't define how the code looks like so I'm going to assume it's going to be a string. I'll choose varchar(255) because I like it. Since code has to be unique, I'll help myself with a trigger a little bit and a unique constraint. I'll hash the value of the code using sha1, save it into binary(20) and make that binary column unique.

What I got out of this: - My codes can look like anything now, be a combination of any type of character - I always have fixed length of the index so I can place a unique constraint without fear - I can search the codes using the hash, but I can display a friendlier version to the customer - There can be only one code value in the database due to unique constraint

CREATE TABLE codes (
    id INT UNSIGNED NOT NULL AUTO_INCREMENT,
    code_value VARCHAR(255) NOT NULL, -- this is the "friendly" value that customers get
    code_hash binary(16) default null, -- this is the hash of the above value
    is_used tinyint not null default '0', -- a small helper field when querying for which codes are "free"
    primary key(id),
    unique(code_hash)
) ENGINE = InnoDB;

The trigger that handles hashing, so we don't have to provide value:

DELIMITER $$

CREATE
TRIGGER `codes_before_insert` BEFORE INSERT
ON `codes`
    FOR EACH ROW BEGIN
        SET NEW.code_hash = UNHEX(SHA1(NEW.code_value));
END$$

DELIMITER ;

The junction table

Used so we don't have to alter any existing user-related tables
Allows for one code per user
Allows for one user to use multiple codes
Contains foreign keys to enforce data integrity

The table:

CREATE TABLE user2code (
    id int unsigned not null auto_increment,
    user_id int unsigned not null,
    code_id int unsigned not null,
    unique(code_id), -- this part allows for only 1 code to be used, ever
    foreign key(user_id) references users(id) on delete cascade,
    foreign key(code_id) references codes(id) on delete cascade
) ENGINE = InnoDB;

We're going to place a utility trigger on junction table. When code is used, we'll update the codes table and set is_used to 1. This is done for easier navigation between free / occupied codes. We could have done this by JOIN-ing the user2code table onto codes table but we want to be a bit more performant.

DELIMITER $$

CREATE
TRIGGER `user2code_after_insert` AFTER INSERT
ON `user2code`
    FOR EACH ROW BEGIN
        UPDATE `codes` SET is_used = 1 WHERE id = NEW.code_id;
END$$

DELIMITER ;

How to use

INSERT INTO user2code 
(user_id, code_id) 
VALUES 
(1, (SELECT id FROM codes WHERE is_used = 0 LIMIT 1));

Outcomes:

When successful, that's it!
In case of an error, you handle the error by repeating the transaction. You can repeat the transaction a few times (3-4) times and if it fails, ask the user to try again later. This is tied to UX which I know nothing about in your case.

The key is knowing that failing transactions aren't bad. That's database's way of telling us hey, you can't do this, here's why.

Placing locks is dangerous - it doesn't guarantee uniqueness. Only unique constraints do.

Good luck!

Did you ever really practice on this solution? Here what I got when insert into the user2code table with trigger error as below: INSERT INTO user2code (code_id) SELECT id FROM codes WHERE is_used = 0 LIMIT 1 Error Code: 1442. Can't update table 'codes' in stored function/trigger because it is already used by statement which invoked this stored function/trigger 0.021 sec — David, Jul 15 '20 at 12:41
INSERT SELECT locks the select table, so that you will never get the chance to update the select table within the trigger function. Also notes must be took the INSERT SELECT is not an efficient operation, check https://stackoverflow.com/questions/2640898/how-to-improve-insert-into-select-locking-behavior. — David, Jul 15 '20 at 12:50

Mysql: lock only one row for select

2 Answers2

Notes on integrity

Update (notes on having an index for used column)

Test setup

Test flow

Results (in seconds)

Update 2 (notes on using a natural PK vs surrogate PK)

Test setup

Test flow

Results (in seconds)

The problem

The solution

The codes table

The junction table

How to use

Update (notes on having an index for `used` column)