1

I'm in my absolute infancy in learning PostgreSQL (I'm working my way through the postgresqltutorial SELECT tutorial right now) and saw DISTINCT ON mentioned. Based on my attempted research, this is actually a very complicated & powerful keyword that's a bit beyond where I'm at now, but I wanted a basic understanding.

From toying around in psql, it seems like it's a way to filter data based on values in one column regardless of whether or not the data in other columns also match. E.g. if you had a table of people and their grades, you could filter so that you'd only have one person per grade without also needing them to have the same name, etc.

Am I on the right track?

iducam
  • 67
  • 1
  • 11
  • [This SO question](https://stackoverflow.com/questions/9795660/postgresql-distinct-on-with-different-order-by) likely comes very close to explaining your doubts. You should review a good tutorial on Postgres and `DISTINCT ON` to better understand the basics of it. – Tim Biegeleisen Jul 09 '21 at 05:51
  • @TimBiegeleisen Thanks! I read over this exact question a little bit ago and didn't understand most of it, so I guess I was right in thinking it's a bit more advanced a concept than where I'm at right now. I'll revisit it once I've progressed a bit to see if it makes more sense. – iducam Jul 09 '21 at 05:54
  • Explanation in the [documentation](https://www.postgresql.org/docs/current/sql-select.html#SQL-DISTINCT) is pretty straightforward. – Tomas Greif Jul 09 '21 at 08:12
  • Most questions around `DISTINCT ON` are answered here: https://stackoverflow.com/a/7630564/939860 – Erwin Brandstetter Jul 09 '21 at 13:15

1 Answers1

1

DISTINCT ON is a strange beast and might be poorly understood without examples. My answer heavily borrows from this tutorial.

Create table:

CREATE TABLE table1(
    id serial,
    fruit_1 TEXT,
    fruit_2 TEXT
)

Insert data:

INSERT INTO table1 (fruit_1, fruit_2)
VALUES ('apple', 'apple'),
       ('apple', 'apple'),
       ('apple', NULL),
       (NULL, 'apple'),
       ('apple', 'mango'),
       ('apple', 'blueberry'),
       ('mango', 'apple'),
       ('mango', 'blueberry'),
       ('mango', 'mango'),
       ('blueberry', 'apple'),
       ('blueberry', 'mango'),
       ('blueberry', 'blueberry')

SELECT * from table1 gives

id  fruit_1    fruit_2
1   apple      apple
2   apple      apple
3   apple      <NULL>
4   <NULL>     apple
5   apple      mango
6   apple      blueberry
7   mango      apple
8   mango      blueberry
9   mango      mango
10  blueberry  apple
11  blueberry  mango
12  blueberry  blueberry

DISTINCT ON keeps only the first row from the chosen column. However, the first row is unpredictable, so you'll want to use ORDER BY to sort (and get a predictable first row):

SELECT DISTINCT ON (fruit_1)
  id, fruit_1, fruit_2
FROM
    table1
ORDER BY fruit_1, fruit_2

id  fruit_1    fruit_2
2   apple      apple
10  blueberry  apple
7   mango      apple
4   <NULL>     apple

You can also use a column number in the parentheses:

SELECT DISTINCT ON (2)
  id, fruit_1, fruit_2
FROM
    table1
ORDER BY fruit_1, fruit_2
bfris
  • 5,272
  • 1
  • 20
  • 37