0

Just started learning sql query in Impala and I still cannot figure out how to get the desired results.

So I have a table:

enter image description here

And I would like to find out how to get the number of distinct product sold over time (from 1 Dec to 3 Dec 2020).

So the 1st day we sold 2 types of product (product_id 1 and 2), the second day we sold the same type of product comparing day before, therefore the cumulative remains as 2 and the last day we sold a product that was not sold two days before therefore the cumulative will be 3 types of products sold on the 3rd day

enter image description here

Thanks in advance!

jonprasetyo
  • 3,356
  • 3
  • 32
  • 48
  • Seems like you expected the query to return [running totals](https://stackoverflow.com/questions/10039431/how-can-i-use-sum-over). – 劉鎮瑲 Aug 14 '20 at 07:21

1 Answers1

1

Simply aggregate to get the first time something is sold and then do a simple aggregation and cumulative sum:

select min_sold_date, sum(count(*)) over (order by min_sold_date)
from (select product_id, min(sold_date) as min_sold_date
      from t
      where sold_date >= ? and sold_date <= ?
      group by product_id
     ) t
group by min_sold_date;

For your sample data, this will not include Dec 2nd. I am guessing that is not a problem in your real data. If it is, you can use a left join to bring in all the dates.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786