Upserting to MySQL, but with multiple columns and unique index as duplicate check?

Question

I saw a lot of people have asked about upserting (this, this, this, this, this, this, and more and even the official doc).

However, something that is not explained well enough for newbies to understand is how to create the duplicate key using primary key or unique indexes.

What I need:
If a table1's unique combination of 3 columns (attributeId, entityId, carId) has a duplicate in table2, then update the value column. Else take table1's row and insert it into table2.

The attributeId, entityId, carId combination will be unique for every row.
ie: If a row has columns as 1,2,5, then no other row will have 1,2,5. But another row might have 5,1,2 or 3,4,2 etc.

The dilemma here is about creating the unique index. Is it sufficient to just do it like this:

CREATE INDEX PIndex ON table1 (attributeId, entityId, carId);

or is it necessary to delete all other indexes and then create this index and then run a query like this? (pseudocode below):

    INSERT INTO table1 (attributeId, entityId, carId, value, name) 
    VALUES (table2.attributeId,table2.entityId,table2.carId,table2.value,table2.name) 
ON DUPLICATE KEY UPDATE value=VALUES(value);

The basic logic being:
If for a row in table2, there is a corresponding row in table1 with exactly the same values for attributeId, entityId and carId, then update the value column in table1 with the value of the value column in table2. If there is no corresponding row, then take the row of table2 and append it to table1.

spencer7593 · Accepted Answer · 2016-10-05T22:44:21.890

Seems like the specification is for two different operations: 1) an UPDATE of existing rows in table1, and 2) an INSERT of new rows into table2.

The specification says "update the value column"... we take that to mean update the value column in the row of table1.

The specification also says "insert ... into table2.

Confusingly, the specification also shows an example pseudo-code INSERT INTO table1.

To perform an UPDATE of table1 based on values in table2, assuming we are going to ignore rows that have a NULL value in any of the three columns...

 UPDATE table1 t
   JOIN table2 s
     ON t.attributeid = s.attributeid
    AND t.entityid    = s.entityid
    AND t.carid       = s.carid
    SET t.value  = s.value

If there are "duplicates" in table2 (i.e. multiple rows in table2 with the same values of the three columns attributeid, entityid and carid, it is indeterminate which of those rows value will be taken from.

To insert a row that is found in table2 but "missing" from table1 (again assuming those three columns may not be unique in table2), we can use an anti-join pattern to eliminate rows which already have a "match" in table1.

For example:

 INSERT INTO table1 (attributeid, entityid, carid, value)
 SELECT v.*
   FROM ( SELECT s.attribute_id
               , s.entity_id
               , s.carid
               , s.value
            FROM table2 s
            LEFT
            JOIN table1 r
              ON r.attributeid = s.attributeid
             AND r.entityid    = s.entityid
             AND r.carid       = s.carid
           WHERE r.attributeid IS NULL
             AND s.attributeid IS NOT NULL
             AND s.entityid    IS NOT NULL
             AND s.carid       IS NOT NULL
           GROUP
              BY s.attributeid
               , s.entityid
               , s.carid
        ) v

If there are "duplicates" in table2 (i.e. multiple rows in table2 with the same values of the three columns attributeid, entityid and carid, it is indeterminate which row value will be taken from.

If there are other UNIQUE constraints defined on other columns, or combinations of columns, the statement has a potential to throw a "duplicate key" error. (Without knowing the key definitions, we're kinda flying blind.) We could add the IGNORE keyword if we want the statement to succeed, just ignoring rows that fail to insert due to "unique key" violations.)

Again, if there are rows in table2 with the same values in the the three columns (no indication is given that this combination of columns is unique in table2), it's indeterminate which of those rows value will be taken from.

The same operations can be performed in the opposite direction, swapping all occurrences of the table references table1 and table2 in the queries.

It's not necessary to add a UNIQUE KEY to either of the tables to perform these operations. There would (likely) be a performance benefit to having a suitable index defined, with those three columns as the leading (first) columns in the index. (That doesn't necessarily need to be a UNIQUE index for this operation.)

If that combination of columns should be unique, then by all means add a UNIQUE KEY on that combination of columns. But the specified operations can be performed without a UNIQUE KEY defined.

The MySQL INSERT ... ON DUPLICATE KEY syntax does require at least one PRIMARY KEY or UNIQUE KEY to operate. If there are multiple UNIQUE KEY constraints on the target table, and an INSERT would violate two or more of the unique key constraints, I believe it's indeterminate which of those keys will be used in the UPDATE action. Personally, I'd tend to steer clear of using that syntax on a table with more than one UNIQUE KEY defined.

My goodness. I'm surprised it was such a complex operation. Works perfectly! Thank you very much for taking the time to help, spencer :) — Nav, Oct 06 '16 at 05:06

Cavaz · Answer 2 · 2016-10-05T20:04:03.157

You can use the syntax

ALTER IGNORE TABLE table1 ADD UNIQUE INDEX PIndex (attributeId, entityId, carId);

According to the documentation:

If IGNORE is specified, only one row is used of rows with duplicates on a unique key. The other conflicting rows are deleted. Incorrect values are truncated to the closest matching acceptable value.

Unfortunately it does not specify which value will be kept. Doing some tests it seems like it keeps the first occurrence, but you can never be sure.

If which entry will be dropped does not bother you this is the easiest solution, otherwise if you want more control it would be better to go through a temporary table.

The command CREATE UNIQUE PIndex ON table1 (attributeId, entityId, carId); (note the added UNIQUE) will simply fail on the first duplicate key, and no option to manage duplicates is available.

Upserting to MySQL, but with multiple columns and unique index as duplicate check?

2 Answers2