13

I want to write a program add new item to table. This item has an unique key name and it can be created by one of 100 threads, so I need to make sure that it is inserted only once.

I have two ideas:

  1. Use insert ignore

  2. Fetch it from database via select then insert it to table if no returned row.

Which option is better? Is there an even more superior idea?

Southpaw Hare
  • 1,535
  • 3
  • 24
  • 50
user3245050
  • 131
  • 1
  • 4
  • If you’re using the second option, then you’ll have to use transactions. – CBroe Jan 28 '14 at 15:15
  • possible duplicate of [On duplicate key ignore?](http://stackoverflow.com/questions/2366813/on-duplicate-key-ignore) – Gordon Linoff Jan 28 '14 at 15:30
  • If you need the id of the field I would suggest using INSERT / ON DUPLICATE KEY and resetting the id field (see here - http://stackoverflow.com/questions/778534/mysql-on-duplicate-key-last-insert-id ), triggering mysql to return the id as the last insert id. – Kickstart Jan 28 '14 at 16:44
  • Are the 100 threads all doing an `INSERT` in the _same_ second? Or are the inserts spread out. – Rick James Apr 28 '17 at 19:16

3 Answers3

15

Late to the party, but I'm pondering something similar.

I created the following table to track active users on a license per day:

CREATE TABLE `license_active_users` (
  `license_active_user_id` int(11) NOT NULL AUTO_INCREMENT,
  `license_id` int(11) NOT NULL,
  `user_id` int(11) NOT NULL,
  `date` date NOT NULL,
  PRIMARY KEY (`license_active_user_id`),
  UNIQUE KEY `license_id` (`license_id`,`user_id`,`date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

In other words, 1 primary key and 1 unique index across the remaining 3 columns.

I then inserted 1 million unique rows into the table.

Attempting to re-insert a subset (10,000 rows) of the same data yielded the following results:

  • INSERT IGNORE: 38 seconds
  • INSERT ... ON DUPLICATE KEY UPDATE: 40 seconds
  • if (!rowExists("SELECT ...")) INSERT: <2 seconds

If those 10,000 rows aren't already present in the table:

  • INSERT IGNORE: 34 seconds
  • INSERT ... ON DUPLICATE KEY UPDATE: 41 seconds
  • if (!rowExists("SELECT ...")) INSERT: 21 seconds

So the conclusion must be if (!rowExists("SELECT ...")) INSERT is fastest by far - at least for this particular table configuration.

The missing test is if (rowExists("SELECT ...")){ UPDATE } else { INSERT }, but I'll assume INSERT ... ON DUPLICATE KEY UPDATE is faster for this operation.

For your particular case, however, I would go with INSERT IGNORE because (as far as I'm aware) it's an atomic operation and that'll save you a lot of trouble when working with threads.

Woodgnome
  • 2,281
  • 5
  • 28
  • 52
10

SELECT + INSERT -- two round trips to the server, hence slower.

INSERT IGNORE -- requires a PRIMARY or UNIQUE key to decide whether to toss the new INSERT. If this works for you, it is probably the best.

REPLACE -- is a DELETE + an INSERT. This is rarely the best.

INSERT ... ON DUPLICATE KEY UPDATE -- This lets you either INSERT (if the PRIMARY/UNIQUE key(s) are not found) or UPDATE. This is the one to use if you have things you need to update in existing rows.

"Burning ids" -- Only the "select+insert" avoids a potential problem: running out of AUTO_INCREMENT ids (I call it "burning ids"). All the other techniques will allocate the next id before deciding whether it is needed.

If you have several names to conditionally insert into a normalization, then a 2-query technique can batch them quite efficiently, and not burn ids: http://mysql.rjweb.org/doc.php/staging_table#normalization

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • If you need the inserted id after IODKU, see the manual about a trick for getting id via `... UPDATE id = LAST_INSERT_ID(id) ...`. – Rick James Oct 19 '15 at 13:08
  • What if this was inside a stored procedure? So it would one round trip anyway. – Andreas Wederbrand Dec 08 '15 at 08:35
  • Yes, a SP would save a roundtrip. The client to server roundtrip depends heavily on physical distance. The cost of initiating each statement is small, but non-zero. – Rick James Dec 08 '15 at 16:48
  • 1
    Boiling it down to 2 round trips > 1 round trip is extremely over simplified. It highly depends on what you are doing. – Woodgnome Dec 20 '16 at 09:06
0

Best: SELECT + INSERT IGNORE.

Because it is use SELECT for check it do not need lock table or row in table. Any INSERT need lock. So this can reduce performance on concurrent INSERT's.

Enyby
  • 4,162
  • 2
  • 33
  • 42