0

names is a table that maps every unique name to an id:

CREATE TABLE names (
  name       VARCHAR(20)    NOT NULL,
  name_id    INT            NOT NULL AUTO_INCREMENT,
  PRIMARY KEY (name),
  UNIQUE (name_id)
) ENGINE=InnoDB

If 'foobar' is already stored in names, I want to get its name_id. Otherwise I want to store it and get the name_id that was generated for it.

How does one do this, dealing with the possibility of two threads racing to add 'foobar'? (Note: we never delete from this table.)

With myISAM I wrote:

GET_LOCK('foobar');
SELECT name_id FROM names WHERE name = 'foobar';
if name wasn't found
    INSERT INTO names SET name='foobar';
    SELECT LAST_INSERT_ID(); -- that is, name_id
RELEASE_LOCK('foobar');

However, I'm very unclear about how to achieve this using transactions and row-level locking in innoDB.

UPDATE: mySQL requires that AUTO_INCREMENT only be used for the PK. I'm using MariaDB, which only requires that the AUTO_INCREMENT column be defined as key. Updating example accordingly.

Chap
  • 3,649
  • 2
  • 46
  • 84
  • Replace `GET_LOCK` and `RELEASE_LOCK` with `START TRANSACTION` and `COMMIT`. – Barmar Aug 04 '17 at 22:17
  • @Barmar What will happen when the second thread executes INSERT and gets a DUPLICATE KEY violation? Or is the second thread blocked at START TRANSACTION until the first thread has ended the transaction? – Chap Aug 04 '17 at 22:38
  • I think the second thread should be blocked. But try it and see. – Barmar Aug 04 '17 at 22:39
  • @Barmar second thread was not blocked - got a duplicate key error. – Chap Aug 04 '17 at 23:41
  • Related: https://stackoverflow.com/questions/25335137/mysql-transaction-select-insert – Barmar Aug 04 '17 at 23:46
  • The existing question that this duplicates is indeed the same problem, but LOCK TABLE is not the answer: we have 20 threads and the names table has 100s of millions of strings in it, and 5% of the time we're adding another. INSERT IGNORE may be the answer but it's not yet clear to me how to determine that it "Ignored" the insert due to a race condition. – Chap Aug 05 '17 at 00:16
  • Use `SELECT ROW_COUNT();`. This will be the number of rows that were inserted, so if the insert was ignored it will be `0`. – Barmar Aug 05 '17 at 00:20
  • So perform the `INSERT` first. If it succeeds, return `LAST_INSERT_ID()`, otherwise do the `SELECT`. – Barmar Aug 05 '17 at 00:22
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/151117/discussion-between-chap-and-barmar). – Chap Aug 05 '17 at 00:22

1 Answers1

1

Try the following:

INSERT IGNORE INTO names (name) VALUES ('foobar');
IF ROW_COUNT() == 1
THEN SELECT LAST_INSERT_ID() AS name_id;
ELSE SELECT name_id FROM names WHERE name = 'foobar';
END IF;

You don't need an explicit transaction if auto-commit is enabled, there will be an implicit transaction around the INSERT IGNORE query, making it atomic.

ROW_COUNT() will tell you whether it was able to insert the new row; it will be 1 if the name was available, 0 if a duplicate was found so no row could be inserted.

Barmar
  • 741,623
  • 53
  • 500
  • 612
  • This works fine. I'm going to implement using SELECT before INSERT, however, because at this point, 95% of the time the name is already there and INSERT usually fails. – Chap Aug 05 '17 at 00:48
  • So `SELECT`, if not found `INSERT IGNORE`, then test result and `SELECT` again? Reasonable based on the preconditions. – Barmar Aug 05 '17 at 06:42