I need to calculate col(current_month) / previous(month) partitioned by id
the issue is the data is not continuous so I can't do a lag(price) partition by month
as the previous row to 2018-04-01
is 2018-02-01
.
Instead of my below example_1
using join or example_2
using nested when calls I was hoping for a more elegant solution, something like lag('price').over(partitionBy('id').rangeBetween('1 month',0)
<- pseudo code
is this possible? or is there an 3rd alternative I haven't thought of?
Input:
+----------+-----+--------+-------+
| month|price|quantity| id|
+----------+-----+--------+-------+
|2018-01-01| 3.96| 53.0|abc##10|
|2018-02-01| 3.96| 49.0|abc##10|
|2018-04-01| 3.81| 150.0|abc##10|
|2018-05-01| 3.81| 14.0|abc##10|
|2018-06-01| 3.73| 13.0|abc##10|
|2018-08-01| 2.97| 27.0|abc##10|
|2018-09-01| 2.97| 22.0|abc##10|
|2018-10-01| 2.97| 10.0|abc##10|
|2018-11-01| 2.97| 35.0|abc##10|
|2018-12-01| 2.97| 99.0|abc##10|
+----------+-----+--------+-------+
output:
# I need the previous month's column to calculate col(current_month) / previous(month) partitioned by id
+----------+-----+--------+-------+----------------------+
| month|price|quantity| id| previous_months_price|
+----------+-----+--------+-------+----------------------+
|2018-01-01| 3.96| 53.0|abc##10| null|
|2018-02-01| 3.96| 49.0|abc##10| 53.0|
|2018-04-01| 3.81| 150.0|abc##10| null|
|2018-05-01| 3.81| 14.0|abc##10| 150.0|
|2018-06-01| 3.73| 13.0|abc##10| 14.0|
|2018-08-01| 2.97| 27.0|abc##10| null|
|2018-09-01| 2.97| 22.0|abc##10| 27.0|
|2018-10-01| 2.97| 10.0|abc##10| 22.0|
|2018-11-01| 2.97| 35.0|abc##10| 10.0|
|2018-12-01| 2.97| 99.0|abc##10| 35.0|
+----------+-----+--------+-------+----------------------+
example_1
lj = df_t.select(
'id',
F.add_months('month',1).alias('month'),
F.col('price').alias('previous_months_price'),
)
df_t.join(lj, ['id','month'], how='left')
example_2
nxt_dt = F.add_months('month',1)
df_t.withColumn(
'previous_month_price',
F.when(
nxt_dt == F.expr('lag(month) over (partition by id order by id,month)'),
F.expr('lag(quantity) over (partition by id order by id,month)'),
)
)