Trying to improve upon an import process which currently relies on a user manipulating data in an excel spreadsheet so that it can be imported into a permanent database table, my current line of thinking was to import the spreadsheet data into a temp table and then use that to update the permanent table - the columns contained in both tables would be identical in terms of data types.
The permanent table
ID | col1 | col2 | col3 |
---|---|---|---|
29384756 | WWWWW | "2021-01-01 00:00:00" | current value |
29384757 | XXXXX | "2021-01-02 00:00:00" | current value |
29384758 | KLMNO | "2022-02-24 00:00:00" | current value |
29384759 | YYYYY | "2021-01-03 00:00:00" | current value |
29384760 | ZZZZZ | "2021-01-04 00:00:00" | current value |
The temporary table
ID | col1 | col2 | col3 |
---|---|---|---|
1 | ABCDE | "2022-02-22 00:00:00" | new value |
2 | FGHIJ | "2022-02-23 00:00:00" | new value |
3 | KLMNO | "2022-02-24 00:00:00" | new value |
4 | PQWRS | "2022-02-25 00:00:00" | new value |
5 | TUVWX | "2022-02-26 00:00:00" | new value |
I want to iterate through each row of the temporary table and perform a duplicate check based on the combination of col1 and col2 from the temp table against the same columns in the perm table.
If the duplicate check returns a result from the perm table, then an UPDATE should be performed
If the duplicate check returns no result from the perm table, then an INSERT should be performed
The permanent table after update from temp table
ID | col1 | col2 | col3 |
---|---|---|---|
29384756 | WWWWW | "2022-02-22 00:00:00" | current value |
29384757 | XXXXX | "2022-02-23 00:00:00" | current value |
29384758 | KLMNO | "2022-02-24 00:00:00" | new value |
29384759 | YYYYY | "2022-02-25 00:00:00" | current value |
29384760 | ZZZZZ | "2022-02-26 00:00:00" | current value |
29384761 | ABCDE | "2022-02-22 00:00:00" | new value |
29384762 | FGHIJ | "2022-02-23 00:00:00" | new value |
29384763 | PQWRS | "2022-02-25 00:00:00" | new value |
29384764 | TUVWX | "2022-02-26 00:00:00" | new value |
The 3rd row from the temp table matched an existing record in the permanent table ("KLMNO2022-02-24 00:00:00" so the col3 value for that row was updated however all other records were inserted as new rows.
The excel solution builds an SQL query for each row in the spreadsheet, using a variable to determine whether or not the row is a duplicate.
For the UPDATE example above the query would look like:
@rowId := IFNULL( (SELECT ID FROM perm_table WHERE CONCAT(col1, col2) = 'KLMNO2022-02-24 00:00:00'),'');insert into perm_table values(@rowId, col1, col2, col3) ON DUPLICATE KEY UPDATE col3 = 'new value';
In that query we get the values 'KLMNO2022-02-24 00:00:00' and 'new value' from cells in a spreadsheet, I'm trying to do the same thing but instead of a spreadsheet, the temporary table is an SQL table.