1

We have a bunch of queries where we UNION data from 3 tables at query time ( we get data from 3 sources ) .

I was wondering if query performance would be any better if we were to merge the data into one table ( with a column source so we know where it came from ).

New table would be much bigger so I’m not sure if we should expect any better performance. Is there a general guidance around this?

Due Crocks
  • 13
  • 3

1 Answers1

3

There should be no significant difference scanning 3 tables VS scanning 1 table with the merged content.

However, please make sure you're using UNION ALL and not UNION. According to the SQL standard, UNION in SQL eliminates duplicate records, and the process of doing that can be very expensive.

Using UNION where UNION ALL should be used is one of the most common mistakes I've seen in SQL, unfortunately. I blame the standard, not the users though :)

See e.g. here for more discussion.

Marcin Zukowski
  • 4,281
  • 1
  • 19
  • 28
  • If the distinct/union was wanted, then using a table would be faster as the cost is only paid when the table is written, once verse on every read. But you core message is very true. – Simeon Pilgrim May 15 '22 at 03:18
  • Yes, we do need distinct on one column. We could load all and do distinct in memory but given millions of rows that we fetch, it’s likely won’t be good idea to do in memory. Good point about paying cost once at insertion. Thanks – Due Crocks May 15 '22 at 06:13
  • @DueCrocks If you need distinct on one key column, you'd need to explain how to handle identical key values with different remaining attributes. For that, UNION nor UNION ALL might not be sufficient – Marcin Zukowski May 17 '22 at 23:15
  • The SELECT query has only one column . – Due Crocks May 19 '22 at 00:29
  • OK, so if you have one column, and do want to do distinct, then `UNION` is indeed the way. Then, indeed, doing it once on load will make the following queries much faster. – Marcin Zukowski May 30 '22 at 23:26