In the past (mid-nineties) some RDBMS engines had very poor implementations of UNION
, so a suggestion to switch to UNION ALL
and filter on the client very often paid off. The performance of UNION
has been optimized, however, so with the modern RDBMSs the decision has to be made case-by-case:
- When you do a
UNION
, the database must eliminate duplicates for you. If the number of records returned from a query is small (say, a few hundred to a thousand) then it does not matter where to eliminate the duplicates, so you might as well do it on the RDBMS side.
- When the number of records gets into tens of thousands, you may be able to do elimination of duplicates in a way that is smarter than that of RDBMS by exploiting specific properties of your data. In this case you would use
UNION ALL
.
- If the number of rows is large and the share of duplicates is very large (say, you
UNION ALL
from five tables, with 70% of rows being duplicates) it may be better to save on the network bandwidth and the client memory by having RDBMS eliminate duplicates, and reduce the size of data to be transfered back to you by 70%.
To summarize, there is no universal scenario. You need to do some calculations and profile your queries before making a decision one way or the other.