Seems like the specification is for two different operations: 1) an UPDATE of existing rows in table1, and 2) an INSERT of new rows into table2.
The specification says "update the value column"... we take that to mean update the value column in the row of table1
.
The specification also says "insert ... into table2
.
Confusingly, the specification also shows an example pseudo-code INSERT INTO table1
.
To perform an UPDATE of table1
based on values in table2
, assuming we are going to ignore rows that have a NULL value in any of the three columns...
UPDATE table1 t
JOIN table2 s
ON t.attributeid = s.attributeid
AND t.entityid = s.entityid
AND t.carid = s.carid
SET t.value = s.value
If there are "duplicates" in table2
(i.e. multiple rows in table2
with the same values of the three columns attributeid, entityid and carid, it is indeterminate which of those rows value
will be taken from.
To insert a row that is found in table2 but "missing" from table1 (again assuming those three columns may not be unique in table2), we can use an anti-join pattern to eliminate rows which already have a "match" in table1.
For example:
INSERT INTO table1 (attributeid, entityid, carid, value)
SELECT v.*
FROM ( SELECT s.attribute_id
, s.entity_id
, s.carid
, s.value
FROM table2 s
LEFT
JOIN table1 r
ON r.attributeid = s.attributeid
AND r.entityid = s.entityid
AND r.carid = s.carid
WHERE r.attributeid IS NULL
AND s.attributeid IS NOT NULL
AND s.entityid IS NOT NULL
AND s.carid IS NOT NULL
GROUP
BY s.attributeid
, s.entityid
, s.carid
) v
If there are "duplicates" in table2 (i.e. multiple rows in table2 with the same values of the three columns attributeid, entityid and carid, it is indeterminate which row value
will be taken from.
If there are other UNIQUE constraints defined on other columns, or combinations of columns, the statement has a potential to throw a "duplicate key" error. (Without knowing the key definitions, we're kinda flying blind.) We could add the IGNORE
keyword if we want the statement to succeed, just ignoring rows that fail to insert due to "unique key" violations.)
Again, if there are rows in table2
with the same values in the the three columns (no indication is given that this combination of columns is unique in table2), it's indeterminate which of those rows value
will be taken from.
The same operations can be performed in the opposite direction, swapping all occurrences of the table references table1
and table2
in the queries.
It's not necessary to add a UNIQUE KEY to either of the tables to perform these operations. There would (likely) be a performance benefit to having a suitable index defined, with those three columns as the leading (first) columns in the index. (That doesn't necessarily need to be a UNIQUE index for this operation.)
If that combination of columns should be unique, then by all means add a UNIQUE KEY on that combination of columns. But the specified operations can be performed without a UNIQUE KEY defined.
The MySQL INSERT ... ON DUPLICATE KEY
syntax does require at least one PRIMARY KEY or UNIQUE KEY to operate. If there are multiple UNIQUE KEY constraints on the target table, and an INSERT would violate two or more of the unique key constraints, I believe it's indeterminate which of those keys will be used in the UPDATE action. Personally, I'd tend to steer clear of using that syntax on a table with more than one UNIQUE KEY defined.