If I was faced with that task, I'd use an anti-join pattern. That's an outer join, to return all rows from the current table, along with "matching" rows from the backup table. Then in the WHERE clause, we exclude all rows that had an exact match. Returning rows that don't match.
SELECT t.*
FROM mytable t
LEFT
JOIN backup_mytable s
ON s.id <=> t.id
AND s.col_two <=> t.col_two
AND s.col_three <=> t.col_three
AND ...
WHERE s.id IS NULL
This assumes that the column id
is guaranteed to be non-NULL. The PRIMARY KEY column (or any column that is part of the PRIMARY KEY of the table, or any column that has a NOT NULL constraint would serve.)
This query only returns the rows that don't match a row in backup table. It doesn't indicate whether its row that doesn't exist, or whether a value of a column was changed.
And to get rows in the original table that don't match rows in the backup table, just swap the table names.
For the special case of a table with all columns defined as NOT NULL, we could take a shortcut on the join predicates.
FROM mytable t
NATURAL
LEFT
JOIN backup_mytable s
WHERE s.id IS NULL
That's equivalent to a LEFT JOIN with a USING clause of all columns that are named the same in both tables.
FROM mytable t
LEFT
JOIN backup_mytable s
USING (id, col_two, col_three, ...)
WHERE s.id IS NULL
That's equivalent to specifying an equality comparison on every column (if both tables have the same columns)
FROM mytable t
LEFT
JOIN backup_mytable s
ON s.id = t.id
AND s.col_two = t.col_two
AND s.col_three = t.col_three
Any occurrences of NULL values in any of the columns are going to screw with the equality comparison, and return NULL.
And that's why the first query uses the null-safe comparison <=>
(spaceship) operator. NULL <=> NULL
will return TRUE, where NULL = NULL
will return NULL.
For that first query pattern, rather than tediously typing out all of those comparisons of every column, I would use SQL to help me generate the SQL I need.
SELECT CONCAT(' AND s.`',c.column_name,'` <=> t.`',c.column_name,'`') AS `-- stmt`
FROM information_schema.columns c
WHERE c.table_schema = 'mydatabase'
AND c.table_name = 'mytable'
ORDER BY c.ordinal_position
I'd take the rows returned by that query, and paste that in
SELECT t.*
FROM ... t
JOIN ... s
ON 1=1
-- paste here --
WHERE s.id IS NULL
ORDER BY t.id
If I needed query that matched on just the id
column, and needed to identify which columns had changed, I'd use expressions in the SELECT list. For example:
SELECT s.`id` <=> t.`id` AS `match_id`
, s.`col_one` <=> t.`col_one` AS `match_col_one`
, s.`col_three` <=> t.`col_three` AS `match_col_three`
FROM mytable t
JOIN backup_mytable s
ON s.id = t.id
HAVING NOT match_col_one
Here referencing the column alias in the SELECT list in a HAVING
clause, to exclude rows that have the same value of col_one
; returning rows where col_one
is different.
Again, I would use SQL against information_schema.columns to help speed up the query writing process.