0

I'm having trouble generating a query that I'm sure is possible. I have a products table and a product_changes table. I would like to select the average of the product_changes.rank field for the top 30 lowest values for each associated product.

Here are the relevant table definitions:

CREATE TABLE products (
  id integer NOT NULL,
  created_at timestamp without time zone
);

CREATE TABLE product_changes (
  id integer NOT NULL,
  product_id integer,
  rank integer,
  created_at timestamp without time zone
);

Here's what I'm trying:

SELECT products.id, avg_rank 
FROM "products" 
JOIN (
  SELECT product_id, AVG(rank) avg_rank
  FROM product_changes 
  GROUP BY product_id, rank
  ORDER BY rank ASC NULLS LAST
  LIMIT 10) pc ON pc.product_id = products.id
WHERE avg_rank IS NOT NULL
LIMIT 10

However, this is giving me the same average of the lowest 30 rank values for each row of the results. That would seem like the ON clause of the JOIN isn't working, but I'm sure it's just my misunderstanding of something.

dbwinger
  • 103
  • 2
  • 6
  • As always, table definitions and Postgres version would help. You mention a `rank field for each product`, but I don't see that column in your query ... – Erwin Brandstetter Oct 01 '16 at 02:08
  • Updated question to include table definitions and clarification about rank field. Sorry it took me so long. This is a side project and got just got time to get back to it. Reading through the answers now. – dbwinger Oct 08 '16 at 16:32
  • You might want to clarify the objective as well. `the average of the product_changes.rank field for the top 30 lowest values for each associated product` is ambiguous. I tried to answer *some* possible interpretations. Is one of them what you want? Please clarify. – Erwin Brandstetter Oct 20 '16 at 03:06
  • FYI, I edited my question for clarity about what I was looking for you. – dbwinger Oct 27 '16 at 23:06

2 Answers2

1

There is some room for interpretation in your question ...

You probably want the average of the 30 lowest rank values in product_changes for each product.

Unless you want to include products without related rows in product_changes, you can get fast results by looking at only product_changes - using the window function row_number() in a subquery:

SELECT id, avg(rank ) AS avg_rank
FROM  (
   SELECT product_id AS id, rank 
        , row_number() OVER (PARTITION BY product_id ORDER BY rank) AS rn
   FROM   product_changes 
   ) sub
WHERE  rn <= 30
GROUP  BY id;

Or maybe you meant 30 products with the lowest value in products.rank and the average rank in related rows in product_changes:

SELECT p.id, pc.avg_rank 
FROM  (
   SELECT id
   FROM   products
   ORDER  BY rank
   LIMIT  30
   ) p
LEFT   JOIN LATERAL (
   SELECT avg(rank) avg_rank
   FROM   product_changes 
   WHERE  product_id = p.id
   ) pc ON true;

Why LEFT JOIN LATERAL ... ON true?

If there is no column products.rank, and you actually meant 30 products with the lowest value in product_changes.rank:

SELECT p.id, pc.avg_rank 
FROM  (
   SELECT product_id AS id
   FROM   product_changes
   ORDER  BY rank
   LIMIT  30
   ) p
LEFT   JOIN LATERAL (
   SELECT avg(rank) avg_rank
   FROM   product_changes 
   WHERE  product_id = p.id
   ) pc ON true;
Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
0

I would use a LATERAL subquery or a correlated subquery to ensure that the subquery is executed for each product. Here's an example:

SELECT products.id, avg_rank 
FROM "products",
     LATERAL (
         SELECT AVG(rank) avg_rank
         FROM (SELECT rank
               FROM product_changes
               WHERE product_id=products.id
               ORDER BY rank ASC NULLS LAST
               LIMIT 30) t1
     ) t2
WHERE avg_rank IS NOT NULL
LIMIT 10
redneb
  • 21,794
  • 6
  • 42
  • 54