0

Lets say I have following data

 id                         some_date    days      weeks     
 1111111111111111111111111  2021-03-01    2         1               
 1111111111111111111111111  2021-03-01    8         2                
 1111111111111111111111111  2021-03-01    9         2               
 1111111111111111111111111  2021-03-01    22        4               
 1111111111111111111111111  2021-03-01    24        4               

and I want to calculate lifetime week totals for every row. For instance, the result for data above would be as following:

 id                         some_date    days      weeks     lifetime_weeks
 1111111111111111111111111  2021-03-01    2         1               1
 1111111111111111111111111  2021-03-01    8         2               2
 1111111111111111111111111  2021-03-01    9         2               2
 1111111111111111111111111  2021-03-01    22        4               3
 1111111111111111111111111  2021-03-01    24        4               3

I tried to implement it with window functions, but as it is not allowed to have distinct inside window functions I ended up with errors

COUNT(distinct id) OVER(PARTITION BY id order by days rows unbounded preceding) as lifetime_weeks

How can I do the same thing without window function? Any help would be much appreciated

fallen
  • 31
  • 11
  • 1
    Does this answer your question? [pyspark: count distinct over a window](https://stackoverflow.com/questions/45869186/pyspark-count-distinct-over-a-window) – mck Mar 30 '21 at 14:28
  • Thanks for your help. I think the answer below is better suits to my problems than the llink you given – fallen Mar 30 '21 at 17:29

1 Answers1

1

The simplest method is to use row_number() to identify the first occurrence of each week, and then use a cumulative sum:

select t.*,
       sum(case when seqnum = 1 then 1 else 0 end) over (partition by id order by days) as num_unique_weeks
from (select t.*,
             row_number() over (partition by id, weeks order by days) as seqnum
      from t
     ) t
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786