UNION
filters out duplicate rows by default. UNION ALL
does not remove duplicates.
But the duplicates are based on all columns being identical, not just the id
column. If a given id
value occurs in both queries, but any of the other two columns are different, then it counts as a distinct row.
If you want to reduce the result to a single row per id
, the use a GROUP BY:
SELECT id, ...aggregate expressions...
FROM (
SELECT my_table.id, a, b ...
UNION
SELECT my_table.id, a, b ...
) AS t
GROUP BY id;
When you GROUP BY id
, then any other expressions of the outer select-list must be in aggregate functions like MAX() or SUM(), etc.
The reason it is important to use an aggregate function is that when there are multiple rows with the same id
value which you want to reduce to one row, what value should be displayed for a
and b
?
Example:
If you group by id
, you would get one row for id=4
, but what value for the other two columns?
Read https://dev.mysql.com/doc/refman/8.0/en/group-by-handling.html for more details on this. Or my answer to Reason for Column is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause
You must use an aggregate function, which includes GROUP_CONCAT() to append all the values from that column in a comma-separated list. Or you can use ANY_VALUE() which picks one of the values from that column arbitrarily.