44

Related to - PostgreSQL DISTINCT ON with different ORDER BY

I have table purchases (product_id, purchased_at, address_id)

Sample data:

| id | product_id |   purchased_at    | address_id |
| 1  |     2      | 20 Mar 2012 21:01 |     1      |
| 2  |     2      | 20 Mar 2012 21:33 |     1      |
| 3  |     2      | 20 Mar 2012 21:39 |     2      |
| 4  |     2      | 20 Mar 2012 21:48 |     2      |

The result I expect is the most recent purchased product (full row) for each address_id and that result must be sorted in descendant order by the purchased_at field:

| id | product_id |   purchased_at    | address_id |
| 4  |     2      | 20 Mar 2012 21:48 |     2      |
| 2  |     2      | 20 Mar 2012 21:33 |     1      |

Using query:

SELECT DISTINCT ON (address_id) purchases.address_id, purchases.*
FROM "purchases"
WHERE "purchases"."product_id" = 2
ORDER BY purchases.address_id ASC, purchases.purchased_at DESC

I'm getting:

| id | product_id |   purchased_at    | address_id |
| 2  |     2      | 20 Mar 2012 21:33 |     1      |
| 4  |     2      | 20 Mar 2012 21:48 |     2      |

So the rows is same, but order is wrong. Any way to fix it?

Mosty Mostacho
  • 42,742
  • 16
  • 96
  • 123
sl_bug
  • 5,066
  • 5
  • 21
  • 22

4 Answers4

30

Quite a clear question :)

SELECT t1.* FROM purchases t1
LEFT JOIN purchases t2
ON t1.address_id = t2.address_id AND t1.purchased_at < t2.purchased_at
WHERE t2.purchased_at IS NULL
ORDER BY t1.purchased_at DESC

And most likely a faster approach:

SELECT t1.* FROM purchases t1
JOIN (
    SELECT address_id, max(purchased_at) max_purchased_at
    FROM purchases
    GROUP BY address_id
) t2
ON t1.address_id = t2.address_id AND t1.purchased_at = t2.max_purchased_at
ORDER BY t1.purchased_at DESC
Mosty Mostacho
  • 42,742
  • 16
  • 96
  • 123
  • 1
    In very few words: The only purchased_at value that has no bigger value is the max, now, return only those rows :) For further reading take a look at [this](http://dev.mysql.com/doc/refman/5.6/en/example-maximum-column-group-row.html) – Mosty Mostacho Mar 20 '12 at 22:51
  • How is the efficiency of this query approach? – Joachim Breitner Nov 03 '15 at 16:28
  • qq regarding your 1st approach: What if the two last purchases were made at the same time? Looks like I'm getting two rows for the same address. – Arkanoid Jun 24 '18 at 23:10
  • 2
    This should not be the accepted answer. Does not cater for duplicate purchased_at. – barnacle.m May 08 '20 at 19:31
  • @barnacle.m That is not really for you to decide. You don't know if for the OP, seeing all the matching column IDs is relevant or not. The obvious question would be: what ID would you show if more than one row matches? You will have to select one and discard the rest. And maybe, the query was used to analyze information that would DELETE ALL the matching rows. So hiding information unilaterally without understanding the business case is not the right way to go. Rather than downvoting, probably you should ask the OP – Mosty Mostacho May 09 '20 at 02:42
  • 2
    No, he's looking for a single record returned according to the question. If the purchased_at is identical to another record, then this solution does _not_ in fact solve it, and anybody else such as myself who came here looking for a solution to the same issue, will usually look at the accepted answer as the "ideal" method, but in this case it is simply incorrect. – barnacle.m May 10 '20 at 19:42
  • 2
    @bernacle.m Nowhere in the question states the answer should return a "single record" or that the OP is a "he". You should read the question carefully rather than skipping to the selected answer and assuming your issue matches the OP's one. Not to mention *purchased_at* could be part of a unique key, but you don't know that either. – Mosty Mostacho May 11 '20 at 14:41
13

Your ORDER BY is used by DISTINCT ON for picking which row for each distinct address_id to produce. If you then want to order the resulting records, make the DISTINCT ON a subselect and order its results:

SELECT * FROM
(
  SELECT DISTINCT ON (address_id) purchases.address_id, purchases.*
  FROM "purchases"
  WHERE "purchases"."product_id" = 2
  ORDER BY purchases.address_id ASC, purchases.purchased_at DESC
) distinct_addrs
order by distinct_addrs.purchased_at DESC
dbenhur
  • 20,008
  • 4
  • 48
  • 45
  • Thanks for answer, but I think that join will be better than subquery, especially when you want to use this inside rails (not just query execute, but construct it in some rails way) – sl_bug Dec 21 '12 at 23:45
  • Best answer, helped me to understand :) – user878812 Feb 01 '18 at 22:17
  • 2
    this is actually the only good working answer. also you can just use `SELECT DISTINCT ON (address_id) *` in the second select row – user151496 Sep 05 '20 at 00:15
  • If I want to limit this for the purposes of paginating on the `purchased_at` this ends up being a problem, the limit has to be applied on the outside. But this means we are always fetching all the rows before applying a limit. – CMCDragonkai Dec 16 '20 at 03:31
4

This query is trickier to rephrase properly than it looks.

The currently accepted, join-based answer doesn’t correctly handle the case where two candidate rows have the same given purchased_at value: it will return both rows.

You can get the right behaviour this way:

SELECT * FROM purchases AS given
WHERE product_id = 2
AND NOT EXISTS (
    SELECT NULL FROM purchases AS other
    WHERE given.address_id = other.address_id
    AND (given.purchased_at < other.purchased_at OR given.id < other.id)
)
ORDER BY purchased_at DESC

Note how it has a fallback of comparing id values to disambiguate the case in which the purchased_at values match. This ensures that the condition can only ever be true for a single row among those that have the same address_id value.

The original query using DISTINCT ON handles this case automatically!

Also note the way that you are forced to encode the fact that you want “the latest for each address_id” twice, both in the given.purchased_at < other.purchased_at condition and the ORDER BY purchased_at DESC clause, and you have to make sure they match. I had to spend a few extra minutes to convince myself that this query is really positively correct.

It’s much easier to write this query correctly and understandbly by using DISTINCT ON together with an outer subquery, as suggested by dbenhur.

Aristotle Pagaltzis
  • 112,955
  • 23
  • 98
  • 97
-1

Try this !

SELECT DISTINCT ON (address_id) *
FROM   purchases
WHERE  product_id = 2
ORDER  BY address_id, purchased_at DESC