Select all columns from rows distinct on one column

Question

I am using Netezza (based on PostgreSQL) and need to select all columns in a table for rows distinct on one column. A related question with answer can be found here, but it doesn't handle the case with all columns, going by that answer throws an error:

select distinct on (some_field) table1.* from table1 order by some_field;

Snippet from error with real data:

"(" (at char 77) expecting '')''

What error are you getting? The code might not do what you want but it shouldn't generate an error. — Gordon Linoff, Jan 27 '17 at 12:53
Can you add some sample table data and the expected result? (As well formatted text.) — jarlh, Jan 27 '17 at 12:53
In fact I should have been clearer on this. I'm running this against Netezza which is based on PostgreSql — user3206440, Jan 28 '17 at 03:42
I removed the `postgresql` tag as Netezza is sufficiently different to Postgres. — , Jan 28 '17 at 14:33

score 3 · Answer 1 · answered Jan 27 '17 at 12:52

3

I don't think your code should throw an error in Postgres. However, it won't do what you expect without an order by:

select distinct on (some_field) table1.*
from table1
order by some_field;

answered Jan 27 '17 at 12:52

Gordon Linoff

1,242,037
58
646
786

That's misleading. `ORDER BY` is in now way necessary here. – Erwin Brandstetter Jan 27 '17 at 14:27
@ErwinBrandstetter . . . Is that really true? I know it might seem to work without the `ORDER BY`, but when I read the documentation (https://www.postgresql.org/docs/9.6/static/sql-select.html#SQL-DISTINCT), I have always interpreted it that the `ORDER BY` is needed not only for ordering within a group but to define the groups. In particular, "The DISTINCT ON expression(s) must match the leftmost ORDER BY expression(s). " – Gordon Linoff Jan 27 '17 at 14:48
Yes, that's really true. Leading `DISTINCT ON` and `ORDER BY` expressions must match - ***if*** both clauses are added. Groups for the distinct operation are defined by `DISTINCT ON` items *exclusively*. `ORDER BY` only affects the sort order of the result - and which rows are picked per group if more expressions are added to influence the order within groups. – Erwin Brandstetter Jan 27 '17 at 15:01
@ErwinBrandstetter . . . Thank you for the clarification. The documentation (which is generally very good for Postgres) could be clearer on this point -- by pointing out that `ORDER BY` is optional. – Gordon Linoff Jan 28 '17 at 04:04

score 1 · Answer 2 · edited May 23 '17 at 12:01

The syntax of your query is correct for Postgres (like you declared at first). See:

Select first row in each GROUP BY group?

You later clarified you actually work with Netezza, which is only loosely related to Postgres. Wikipedia states:

Netezza is based on PostgreSQL 7.2,^[8] but does not maintain compatibility.

Netezza does not seem to support DISTINCT ON (), only DISTINCT.

It supports row_number(), though. So this should work:

SELECT *
FROM  (
   SELECT *, row_number() OVER (PARTITION BY some_field) AS rn
   FROM   table1
   ) sub
WHERE  rn = 1;

The question remains: Which row do you want from each set with identical some_field. If any row is good, you are done here. Else, you need to add ORDER BY to the OVER clause.

SQL to get unique rows in Netezza DB

Select all columns from rows distinct on one column

2 Answers2

Linked