1

I have table products:

+----------+-----------+----------+---------+
|family_id |shopper_id |product_id|quantity |
+----------+-----------+----------+---------+
|A         |1          |Kit Kat   |10       |
|A         |1          |Kit Kat   |5        |
|A         |1          |Snickers  |9        |
|A         |2          |Kit Kat   |7        |
|B         |3          |Kit Kat   |2        |
+----------+---------- +----------+---------+

For each product, I want to calculate 2 totals:

  • total quantity per shopper
  • total quantity per family. Sum of total quantities for all shoppers in the same family.

The final table should look like:

+----------+----------+-------------------------+-----------------------+
|shopper_id|product_id|total_quantity_shopper   |total_quantity_family  |
+----------+----------+-------------------------+-----------------------+
|1         |Kit Kat   | 15                      | 22                    |
|1         |Snickers  | 9                       | 9                     |
|2         |Kit Kat   | 7                       | 22                    |
|3         |Kit Kat   | 2                       | 2                     |
+----------+----------+-------------------------|-----------------------|

This is my query:

SELECT
    distinct shopper_id,
    product_id,
    sum(quantity) OVER (PARTITION BY shopper_id, product_id) as total_quantity_shopper,
    sum(quantity) OVER (PARTITION BY family_id, product_id) as total_quantity_family
FROM
    products;

But looking at the query plan, it looks very inefficient (I think). How can I improve the query above?

alhazen
  • 1,907
  • 3
  • 22
  • 43

1 Answers1

3

I think the family is a hierarchy for the shopper. So, I would suggest group by and window functions:

select family_id, shopper_id, product_id,
       sum(quantity) as total_quantity_shopper,
       sum(sum(quantity)) over (partition by family_id, product_id) as total_quantity_family
from products
group by family_id, shopper_id, product_id
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Getting this error: `Failed to breakup Windowing invocations into Groups. At least 1 group must only depend on input columns. Also check for circular dependencies. Underlying error: ...Expression not in GROUP BY key 'quantity' ` – alhazen Nov 04 '20 at 22:29
  • @alhazen . . . Oops. I missed a `sum()` in that syntax. – Gordon Linoff Nov 04 '20 at 23:39
  • beautiful, @GordonLinoff . I just found out that, at least in PostgreSQL and Vertica (I have no access to a hive database) , you can even omit `family_id` from the column list, while still grouping by it .... – marcothesane Nov 05 '20 at 00:09
  • @marcothesane . . . You can actually remove `family_id` from `select` in any database. But I think the results make more sense if it is there. – Gordon Linoff Nov 05 '20 at 01:35