Correct query
Since you need the max ref!
the correct form for DISTINCT ON
would be:
SELECT DISTINCT ON (code)
id, name, code
FROM old_tab
WHERE (conditions)
ORDER BY code, ref DESC
This is generally faster, simpler and shorter than solutions with a subquery and window function or aggregate functions.
If there can be multiple rows sharing the highest ref
, add more ORDER BY
items as tiebrekaer to decide which row to return. Or Postgres will pick an arbitrary one, since DISTINCT ON
always returns a single row per DISTINCT
expression.
Explanation, links and a benchmark comparing these styles in this closely related answer:
Select first row in each GROUP BY group?
Another fast way would be:
SELECT id, name, code, ref
FROM old_tab t
WHERE (conditions)
AND NOT EXISTS (
SELECT 1
FROM old_tab t2
WHERE (conditions)
AND t2.code = t.code
AND t2.ref > t.ref
)
The small difference: this one does not break ties. If multiple rows per code
(and conditions
) share the highest ref
, multiple rows will be returned.
CREATE TABLE AS
And for creating a new table from a SELECT
, the recommended form is CREATE TABLE AS
. Quoting the manual here:
This command is functionally similar to SELECT INTO
, but it is
preferred since it is less likely to be confused with other uses of
the SELECT INTO
syntax. Furthermore, CREATE TABLE AS
offers a superset
of the functionality offered by SELECT INTO
.
Bold emphasis mine.
So use:
CREATE TABLE new_tab AS
SELECT DISTINCT ON (code)
id, name, code
FROM old_tab
WHERE (conditions)
ORDER BY code, ref DESC;