0

Let's say today is 20221031. Data is like follows:

create_time id type amount
20221031 1 A 10
20221030 1 A 10
20221031 2 A 10
20221030 2 B 10
20220928 1 A 10

What I want:

create_time id type amount sum_amount_within_one_month sum_amount_within_three_months
20221031 1 A 10 20(there are two records where id=1 and type=A within one month) 30(there are three records where id=1 and type=A within 3 months)
20221030 1 A 10 20 30
20221031 2 A 10 10 10
20221030 2 B 10 10 10
20220928 1 A 10 20 30

What I have tried:

df.withColumn("sum_amount_within_one_month", F.sum("amount").over(Window.partitionBy("id","type").where(df.create_date > "20220930")))

This fails with an error: WindowSpec object has no attribute 'where'

Shengxin Huang
  • 647
  • 1
  • 11
  • 25
  • you might be able to use `rangeBetween` similar to [this](https://stackoverflow.com/q/73865423/8279585) – samkart Oct 31 '22 at 11:25
  • lets say there are 4 records, of the same id/type, of 4 consecutive months (e.g. 1, 2, 3, 4) - how does the sum look like for the months? – samkart Oct 31 '22 at 11:51

0 Answers0