0

I want to create a table with a subset of records from a master table. for example, I have:

id  name   code  ref
1   peter  73    2.5
2   carl   84    3.6
3   jack   73    1.1

I want to store peter and carl but not jack because has same peter's code. I need the max ref!

I try this:

SELECT id, name, DISTINCT(code) INTO new_tab
FROM old_tab 
WHERE (conditions)

but it doesn't work.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Emanuele Mazzoni
  • 716
  • 3
  • 13
  • 24

4 Answers4

3

You can try a sub-query like this:

SELECT ot.* FROM old_tab ot
JOIN
(
   SELECT "code", MAX("ref") AS "MaxRef"
   FROM old_tab
   GROUP BY "code"
) tbl
ON ot."code" = tbl."code"
AND ot."ref" = tbl."MaxRef"

Output:

╔════╦═══════╦══════╦═════╗
║ ID ║ NAME  ║ CODE ║ REF ║
╠════╬═══════╬══════╬═════╣
║  1 ║ peter ║   73 ║ 2.5 ║
║  2 ║ carl  ║   84 ║ 3.6 ║
╚════╩═══════╩══════╩═════╝

See this SQLFiddle

Himanshu
  • 31,810
  • 31
  • 111
  • 133
3

You can use window functions for this:

select t.id, t.name, t.code, t.ref
from (select t.*,
             row_number() over (partition by code order by ref desc) as seqnum
      from old_tab t
     ) t
where seqnum = 1;

The insert statement just wraps insert around this:

insert into new_tab(id, name, code)
    select t.id, t.name, t.code
    from (select t.*,
                 row_number() over (partition by code order by ref desc) as seqnum
          from old_tab t
         ) t
    where seqnum = 1;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
1

Try something like:

SELECT DISTINCT ON (code) id, name, code
FROM old_tab
WHERE conditions
ORDER BY code, ref DESC
Ihor Romanchenko
  • 26,995
  • 8
  • 48
  • 44
0

Correct query

Since you need the max ref! the correct form for DISTINCT ON would be:

SELECT DISTINCT ON (code)
       id, name, code
FROM   old_tab
WHERE  (conditions)
ORDER  BY code, ref DESC

This is generally faster, simpler and shorter than solutions with a subquery and window function or aggregate functions.
If there can be multiple rows sharing the highest ref, add more ORDER BY items as tiebrekaer to decide which row to return. Or Postgres will pick an arbitrary one, since DISTINCT ON always returns a single row per DISTINCT expression.

Explanation, links and a benchmark comparing these styles in this closely related answer:
Select first row in each GROUP BY group?

Another fast way would be:

SELECT id, name, code, ref
FROM   old_tab t
WHERE  (conditions)
AND    NOT EXISTS (
   SELECT 1
   FROM   old_tab t2
   WHERE  (conditions)
   AND    t2.code = t.code
   AND    t2.ref > t.ref
   )

The small difference: this one does not break ties. If multiple rows per code (and conditions) share the highest ref, multiple rows will be returned.

CREATE TABLE AS

And for creating a new table from a SELECT, the recommended form is CREATE TABLE AS. Quoting the manual here:

This command is functionally similar to SELECT INTO, but it is preferred since it is less likely to be confused with other uses of the SELECT INTO syntax. Furthermore, CREATE TABLE AS offers a superset of the functionality offered by SELECT INTO.

Bold emphasis mine.
So use:

CREATE TABLE new_tab AS
SELECT DISTINCT ON (code)
       id, name, code
FROM   old_tab
WHERE  (conditions)
ORDER  BY code, ref DESC;
Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228