Improve performance of query with union in from clause subquery

Question

How might I improve the performance of this query? I have an index on c1 and there is no primary key by choice on tb1. My goal is to create a record for each instance of c3.

select 
t1.*

into tb2

from (
select 
c1,
c2,
c3
from tb1
where c1 = '1'

union

select 
c1,
c2,
c4
from tb1
where c1 = '1'
) as t1
where t1.c3 != ''

Table definition (PG v9.0):

CREATE TABLE tb1 (
  c1 text
  c2 text
  c3 text
  c4 text
);

Index:

CREATE INDEX c1_idx   ON tb1 USING btree (c1);

Please always include table definitions and your version of Postgres for performance optimization questions. — Erwin Brandstetter, Feb 20 '14 at 22:35

score 0 · Accepted Answer · edited May 23 '17 at 12:10

0

CREATE TABLE tb2 AS 
SELECT c1, c2, c3
FROM   tb1
WHERE  c1 = '1'
AND    c3 <> ''

UNION ALL
SELECT c1, c2, c4
FROM   tb1
WHERE  c1 = '1'
AND    c4 <> '';

Use UNION ALL unless you want to eliminate possible duplicates. Faster. The result is different if duplicates occur!
No need for a subquery.
Use CREATE TABLE AS instead of SELECT INTO. Per documentation:

This command is functionally similar to SELECT INTO, but it is preferred since it is less likely to be confused with other uses of the SELECT INTO syntax. Furthermore, CREATE TABLE AS offers a superset of the functionality offered by SELECT INTO.

The single quotes around '1' indicate you are storing numeric data as string type. If so, that's not good. Consider appropriate data types ...

Alternative with CTE

If the table is big, this alternative with a CTE will be probably be faster, since we only read once from it.

Also including additional requests from comment, add column, add pk:

CREATE TABLE tb2 AS 
WITH cte AS (
   SELECT c1, c2, c3, c4
   FROM   tb1
   WHERE  c1 = '1'
   AND   (c3 <> '' OR c4 <> '')
   )
SELECT c1, c2, c3, NULL::text AS new_column  -- add new column
FROM   cte
WHERE  c3 <> ''

UNION ALL
SELECT c1, c2, c4, NULL
FROM   cte
WHERE  c4 <> '';

-- add primary key:
ALTER TABLE tb2 ADD CONSTRAINT tb2_pkey PRIMARY KEY(???);

If the table is big, you might want to increase temp_buffers setting (for the session only). More info in this related answer:
How to delete duplicate entries?

If you need to run this often (and don't mind the small penalty on write opertaions), and if you can define a superset of rows that eliminates large parts of the rest, I would suggest a partial index ...

edited May 23 '17 at 12:10

Community

1
1

answered Feb 20 '14 at 22:40

Erwin Brandstetter

605,456
145
1,078
1,228

Excellent! I will report results. All columns are text as the table is denormailized. – swhit Feb 20 '14 at 22:42
@swhit: Depending on *exact* requirements, there might be more options ... I added a few more pointers. – Erwin Brandstetter Feb 20 '14 at 23:07
I like the alternative with a CTE. However, I have a further requirement that I thought would have little impact. I need to add a primary key column and two additional columns to the resulting table. Thanks so much. – swhit Feb 21 '14 at 00:11
I can probably add the additional columns to the CTE? – swhit Feb 21 '14 at 00:14
@swhit: Depends where the columns come from. Or just new, empty columns? Your information is lacking. I added yet more to my answer. For new requirements start a *new question*. You can always reference this one for context. – Erwin Brandstetter Feb 21 '14 at 00:42
The cost of the CTE query is ~15 million vs ~6 million for the select into. :( – swhit Feb 21 '14 at 14:48
Every union does a CTE scan with the full amount of records. – swhit Feb 21 '14 at 14:54
CREATE TABLE AS results in approximately the same cost as SELECT INTO. – swhit Feb 21 '14 at 15:00
I added an index for each c(n) <> '' and it reduced it to apprx. 1.5 million. – swhit Feb 21 '14 at 15:37
@swhit: 1) 2x full CTE scan is to be expected. CTE scan != seq. table scan. 2) There is no performance difference between `CREATE TABLE AS` and `SELECT INTO`. The former is just better form. 3) Partial index, horray! :) – Erwin Brandstetter Feb 21 '14 at 16:18

Improve performance of query with union in from clause subquery

1 Answers1

Alternative with CTE