How to insert many records excluding some in PostgreSQL

Question

I want to create a table with a subset of records from a master table. for example, I have:

id  name   code  ref
1   peter  73    2.5
2   carl   84    3.6
3   jack   73    1.1

I want to store peter and carl but not jack because has same peter's code. I need the max ref!

I try this:

SELECT id, name, DISTINCT(code) INTO new_tab
FROM old_tab 
WHERE (conditions)

but it doesn't work.

score 3 · Answer 1 · answered Jul 16 '13 at 10:48

You can try a sub-query like this:

SELECT ot.* FROM old_tab ot
JOIN
(
   SELECT "code", MAX("ref") AS "MaxRef"
   FROM old_tab
   GROUP BY "code"
) tbl
ON ot."code" = tbl."code"
AND ot."ref" = tbl."MaxRef"

Output:

╔════╦═══════╦══════╦═════╗
║ ID ║ NAME  ║ CODE ║ REF ║
╠════╬═══════╬══════╬═════╣
║  1 ║ peter ║   73 ║ 2.5 ║
║  2 ║ carl  ║   84 ║ 3.6 ║
╚════╩═══════╩══════╩═════╝

See this SQLFiddle

score 3 · Accepted Answer · answered Jul 16 '13 at 10:51

3

You can use window functions for this:

select t.id, t.name, t.code, t.ref
from (select t.*,
             row_number() over (partition by code order by ref desc) as seqnum
      from old_tab t
     ) t
where seqnum = 1;

The insert statement just wraps insert around this:

insert into new_tab(id, name, code)
    select t.id, t.name, t.code
    from (select t.*,
                 row_number() over (partition by code order by ref desc) as seqnum
          from old_tab t
         ) t
    where seqnum = 1;

answered Jul 16 '13 at 10:51

Gordon Linoff

1,242,037
58
646
786

Why you use row_number() function? – Emanuele Mazzoni Jul 16 '13 at 11:05
@Mrslipp3ry Does it make more sense if you think of `row_number` as if it was `rank()` or `dense_rank()`? It doesn't matter, since all of them start with `1`... – Craig Ringer Jul 16 '13 at 11:19

Ihor Romanchenko · Answer 3 · 2013-07-16T16:03:28.387

1

Try something like:

SELECT DISTINCT ON (code) id, name, code
FROM old_tab
WHERE conditions
ORDER BY code, ref DESC

edited Jul 16 '13 at 16:03

answered Jul 16 '13 at 10:50

Ihor Romanchenko

26,995
8
48
44

I think you missed an essential part of the question. – Erwin Brandstetter Jul 16 '13 at 15:58

score 0 · Answer 4 · edited May 23 '17 at 12:29

Correct query

Since you need the max ref! the correct form for DISTINCT ON would be:

SELECT DISTINCT ON (code)
       id, name, code
FROM   old_tab
WHERE  (conditions)
ORDER  BY code, ref DESC

This is generally faster, simpler and shorter than solutions with a subquery and window function or aggregate functions.
If there can be multiple rows sharing the highest ref, add more ORDER BY items as tiebrekaer to decide which row to return. Or Postgres will pick an arbitrary one, since DISTINCT ON always returns a single row per DISTINCT expression.

Explanation, links and a benchmark comparing these styles in this closely related answer:
Select first row in each GROUP BY group?

Another fast way would be:

SELECT id, name, code, ref
FROM   old_tab t
WHERE  (conditions)
AND    NOT EXISTS (
   SELECT 1
   FROM   old_tab t2
   WHERE  (conditions)
   AND    t2.code = t.code
   AND    t2.ref > t.ref
   )

The small difference: this one does not break ties. If multiple rows per code (and conditions) share the highest ref, multiple rows will be returned.

`CREATE TABLE AS`

And for creating a new table from a SELECT, the recommended form is CREATE TABLE AS. Quoting the manual here:

This command is functionally similar to SELECT INTO, but it is preferred since it is less likely to be confused with other uses of the SELECT INTO syntax. Furthermore, CREATE TABLE AS offers a superset of the functionality offered by SELECT INTO.

Bold emphasis mine.
So use:

CREATE TABLE new_tab AS
SELECT DISTINCT ON (code)
       id, name, code
FROM   old_tab
WHERE  (conditions)
ORDER  BY code, ref DESC;

How to insert many records excluding some in PostgreSQL

4 Answers4

See this SQLFiddle

Correct query

CREATE TABLE AS

`CREATE TABLE AS`