5

I am using Rails 4.2 with PostgreSQL. I have a Product model and a Purchase model with Product has many Purchases. I want to find the distinct recently purchased products. Initially I tried:

Product.joins(:purchases)
.select("DISTINCT products.*, purchases.updated_at") #postgresql requires order column in select
.order("purchases.updated_at DESC")

This however results in duplicates because it tries to find all tuples where the pair (product.id and purchases.updated_at) has a unique value. However I just want to select the products with distinct id after the join. If a product id appears multiple times in the join, only select the first one. So I also tried:

Product.joins(:purchases)
.select("DISTINCT ON (product.id) purchases.updated_at, products.*")
.order("product.id, purchases.updated_at") #postgres requires that DISTINCT ON must match the leftmost order by clause

This doesn't work because I need to specify product.id in the order clause because of this constraint which outputs unexpected order.

What is the rails way to achieve this?

Community
  • 1
  • 1
aandis
  • 4,084
  • 4
  • 29
  • 40

5 Answers5

5

So building on @ErwinBrandstetter answer, I finally found the right way of doing this. The query to find distinct recent purchases is

SELECT *
FROM  (
   SELECT DISTINCT ON (pr.id)
          pu.updated_at, pr.*
   FROM   Product pr
   JOIN   Purchases pu ON pu.product_id = pr.id
   ) sub
ORDER  BY updated_at DESC NULLS LAST;

The order_by isn't needed inside the subquery, since we are anyway ordering in the outer query.

The rails way of doing this is -

inner_query = Product.joins(:purchases)
  .select("DISTINCT ON (products.id) products.*, purchases.updated_at as date") #This selects all the unique purchased products.

result = Product.from("(#{inner_query.to_sql}) as unique_purchases")
  .select("unique_purchases.*").order("unique_purchases.date DESC")

The second (and better) way to do this as suggested by @ErwinBrandstetter is

SELECT *
FROM   Product pr
JOIN  (
   SELECT product_id, max(updated_at) AS updated_at
   FROM   Purchases 
   GROUP  BY 1
   ) pu ON pu.product_id = pr.id
ORDER  BY pu.updated_at DESC NULLS LAST;

which can written in rails as

join_query = Purchase.select("product_id, max(updated_at) as date")
  .group(1) #This selects most recent date for all purchased products

result = Product.joins("INNER JOIN (#{join_query.to_sql}) as unique_purchases ON products.id = unique_purchases.product_id")
  .order("unique_purchases.date")
aandis
  • 4,084
  • 4
  • 29
  • 40
  • 2
    As for the 1st query: `The order_by isn't needed inside the subquery`. While this typically works, it's just an implementation detail that can break any time. You *need* `ORDER BY` in the subquery to guarantee the latest row. Postgres is free to return *any* row for each `pr.id` otherwise. – Erwin Brandstetter Sep 28 '15 at 22:10
3

Use a subquery and add a different ORDER BY clause in the outer SELECT:

SELECT *
FROM  (
   SELECT DISTINCT ON (pr.id)
          pu.updated_at, pr.*
   FROM   Product pr
   JOIN   Purchases pu ON pu.product_id = pr.id  -- guessing
   ORDER  BY pr.id, pu.updated_at DESC NULLS LAST
   ) sub
ORDER  BY updated_at DESC NULLS LAST;

Details for DISTINCT ON:

Or some other query technique:

But if all you need from Purchases is updated_at, you can get this cheaper with a simple aggregate in a subquery before you join:

SELECT *
FROM   Product pr
JOIN  (
   SELECT product_id, max(updated_at) AS updated_at
   FROM   Purchases 
   GROUP  BY 1
   ) pu ON pu.product_id = pr.id  -- guessing
ORDER  BY pu.updated_at DESC NULLS LAST;

About NULLS LAST:

Or even simpler, but not as fast while retrieving all rows:

SELECT pr.*, max(updated_at) AS updated_at
FROM   Product pr
JOIN   Purchases pu ON pu.product_id = pr.id
GROUP  BY pr.id  -- must be primary key
ORDER  BY 2 DESC NULLS LAST;

Product.id needs to be defined as primary key for this to work. Details:

If you fetch only a small selection (with a WHERE clause restricting to just one or a few pr.id for instance), this will be faster.

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Is there a `rails` way of doing this? – aandis Sep 25 '15 at 05:32
  • @zack: There sure is. Should be easy to translate, but I am no Rails expert. Ultimately, the code has to be translated to SQL before being sent to the DB server. – Erwin Brandstetter Sep 25 '15 at 05:35
  • @zack, well, you'll likely have to use SQL strings. The SQL above uses features that, I believe, Arel doesn't support. So you'll have to use things like `.select('DISTINCT ON (...` and `.order('whatever DESC NULLS FIRST')`. Or perhaps you could hack on Arel to support these modifiers, but that would take quite a bit of time. Arel is under the hood of ActiveRecord in Rails. – D-side Sep 25 '15 at 15:52
  • @D-side I figured out the way to do it. Please check my answer. – aandis Sep 25 '15 at 17:22
  • @zack: I added another alternative. – Erwin Brandstetter Sep 28 '15 at 22:35
2

To build on erwin-brandstetter's answer, this is how you could do this with ActiveRecord (should be close at least):

Product
  .select('*')
  .joins('INNER JOIN (SELECT product_id, max(updated_at) AS updated_at FROM Purchases GROUP  BY 1) pu ON pu.product_id = pr.id')
  .order('pu.updated_at DESC NULLS LAST')
Community
  • 1
  • 1
Tom Fast
  • 1,138
  • 9
  • 15
0

Try to do this:

Product.joins(:purchases)
.select("DISTINCT ON (products_id) purchases.product_id, purchases.updated_at, products.*")
.order("product_id, purchases.updated_at") #postgres requires that DISTINCT ON must match the leftmost order by clause
akbarbin
  • 4,985
  • 1
  • 28
  • 31
  • This would result in duplicates. A product id might appear multiple times in purchases table each with different updated_at value. I want to select the product just once, – aandis Sep 25 '15 at 07:42
  • How is grouping by `products.id` different from `purchases.product_id`? – aandis Sep 25 '15 at 07:51
  • Sorry about that. Both of the are equal. I have updated again based on your code. – akbarbin Sep 25 '15 at 09:57
  • This is same as what I have in my question. – aandis Sep 25 '15 at 12:03
  • Product.joins(:purchases) .select("DISTINCT ON (products.id) purchases.product_id, purchases.updated_at, products.*") .order("product_id, purchases.updated_at") – Kiry Meas Jan 14 '21 at 05:04
0

I ended up with this -

Product.joins(:purchases)
.select("DISTINCT ON (products.id) products.*, purchases.updated_at as date")
.sort_by(&:date)
.reverse

Still looking for a better way to do this.

aandis
  • 4,084
  • 4
  • 29
  • 40