Normally rangeBetween
works well if we know the number of days. For example, the following window function:
def days(i:Int):Long = i*86400
val w = Window.partitionBy(col("Store")).orderBy(col("mth").cast("timestamp").cast("long"))
.rangeBetween(days(7), 0)
How can we expand this to months, as number of days in a month is not constant? Can I use any month function in rangeBetween
to solve the problem?
Solution for this Problen
df.createOrReplaceTempView("df")
spark.sql(
"""SELECT *, mean(some_value) OVER (
PARTITION BY id
ORDER BY CAST(start AS timestamp)
RANGE BETWEEN INTERVAL 7 DAYS PRECEDING AND CURRENT ROW
) AS mean FROM df""").show()