I have a Dataframe which has following data(simulation).
Price | Date | InvoiceNumber | Product_Type |
---|---|---|---|
12.65 | 12/30/2021 | INV_19984 | AXN UN1234 |
18.78 | 1/23/2022 | INV_200174 | AXN UN1234 |
11.78 | 1/25/2022 | INV_200173 | AXN UN1234 |
11.1 | 3/2/2022 | INV_9912 | AXN UN1234 |
I am trying to find minimum price of the product in the last 30days, 60days and 90days without filtering the data. This data is huge and hence avoiding join operation so using a window function. But It returns null.
wi = Window.partitionBy("Product_Type").orderBy("Date").rowsBetween(30 ,Window.currentRow)
f1 = df.withColumn("minPrice30",F.min("price").over(wi))
Expected output is :
minPrice60 | minPrice30 | Price | Date | InvoiceNumber | Product_Type |
---|---|---|---|---|---|
12.65 | 12.65 | 12.65 | 12/30/2021 | INV_19984 | AXN UN1234 |
12.65 | 12.65 | 18.78 | 1/23/2022 | INV_200174 | AXN UN1234 |
11.78 | 11.78 | 11.78 | 1/25/2022 | INV_200173 | AXN UN1234 |
11.1 | 11.1 | 11.1 | 3/2/2022 | INV_9912 | AXN UN1234 |
Is there any way we can achieve this within the date range? Kindly Suggest