Let's say today is 20221031. Data is like follows:
create_time | id | type | amount |
---|---|---|---|
20221031 | 1 | A | 10 |
20221030 | 1 | A | 10 |
20221031 | 2 | A | 10 |
20221030 | 2 | B | 10 |
20220928 | 1 | A | 10 |
What I want:
create_time | id | type | amount | sum_amount_within_one_month | sum_amount_within_three_months |
---|---|---|---|---|---|
20221031 | 1 | A | 10 | 20(there are two records where id=1 and type=A within one month) | 30(there are three records where id=1 and type=A within 3 months) |
20221030 | 1 | A | 10 | 20 | 30 |
20221031 | 2 | A | 10 | 10 | 10 |
20221030 | 2 | B | 10 | 10 | 10 |
20220928 | 1 | A | 10 | 20 | 30 |
What I have tried:
df.withColumn("sum_amount_within_one_month", F.sum("amount").over(Window.partitionBy("id","type").where(df.create_date > "20220930")))
This fails with an error: WindowSpec object has no attribute 'where'