I'm looking how to translate this chunk of SQL code into PySpark syntax.
SELECT MEAN(some_value) OVER (
ORDER BY yyyy_mm_dd
RANGE BETWEEN INTERVAL 3 MONTHS PRECEDING AND CURRENT ROW
) AS mean
FROM
df
If the above was a range expressed in days, this could easily have been done using something like
.orderBy(F.expr("datediff(col_name, '1000')")).rangeBetween(-7, 0)
(See also ZygD's solution here: Spark Window Functions - rangeBetween dates)
For a range in months, this however doesn't work as the number of days in a month is not a constant. Any idea how to perform a range considering months using PySpark syntax?