0

I am using Netezza (based on PostgreSQL) and need to select all columns in a table for rows distinct on one column. A related question with answer can be found here, but it doesn't handle the case with all columns, going by that answer throws an error:

select distinct on (some_field) table1.* from table1 order by some_field;

Snippet from error with real data:

"(" (at char 77) expecting '')''

Community
  • 1
  • 1
user3206440
  • 4,749
  • 15
  • 75
  • 132

2 Answers2

3

I don't think your code should throw an error in Postgres. However, it won't do what you expect without an order by:

select distinct on (some_field) table1.*
from table1
order by some_field;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • That's misleading. `ORDER BY` is in now way necessary here. – Erwin Brandstetter Jan 27 '17 at 14:27
  • @ErwinBrandstetter . . . Is that really true? I know it might seem to work without the `ORDER BY`, but when I read the documentation (https://www.postgresql.org/docs/9.6/static/sql-select.html#SQL-DISTINCT), I have always interpreted it that the `ORDER BY` is needed not only for ordering within a group but to define the groups. In particular, "The DISTINCT ON expression(s) must match the leftmost ORDER BY expression(s). " – Gordon Linoff Jan 27 '17 at 14:48
  • Yes, that's really true. Leading `DISTINCT ON` and `ORDER BY` expressions must match - ***if*** both clauses are added. Groups for the distinct operation are defined by `DISTINCT ON` items *exclusively*. `ORDER BY` only affects the sort order of the result - and which rows are picked per group if more expressions are added to influence the order within groups. – Erwin Brandstetter Jan 27 '17 at 15:01
  • @ErwinBrandstetter . . . Thank you for the clarification. The documentation (which is generally very good for Postgres) could be clearer on this point -- by pointing out that `ORDER BY` is optional. – Gordon Linoff Jan 28 '17 at 04:04
1

The syntax of your query is correct for Postgres (like you declared at first). See:

You later clarified you actually work with Netezza, which is only loosely related to Postgres. Wikipedia states:

Netezza is based on PostgreSQL 7.2,[8] but does not maintain compatibility.

Netezza does not seem to support DISTINCT ON (), only DISTINCT.

It supports row_number(), though. So this should work:

SELECT *
FROM  (
   SELECT *, row_number() OVER (PARTITION BY some_field) AS rn
   FROM   table1
   ) sub
WHERE  rn = 1;

The question remains: Which row do you want from each set with identical some_field. If any row is good, you are done here. Else, you need to add ORDER BY to the OVER clause.

Related:

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228