i have use case where i need to do a calculation of a percentile on a column (let's call it X) over a sliding window. so the window definition is chronological - last 120 days:
days = lambda i: i * 86400
w = Window.partitionBy("entityId").orderBy(F.col("trn_time").cast("long").asc())
.rangeBetween(-days(120),-days(1))
i thought on using approxQuantile but it is a Dataframe function . second option is using :
percent_rank().over(w)
but i need to sort the window by the numeric column (X) that i want to do the percentile on , and the window is already sorted by time. when i try to add X to the orderBY in the window definition :
w = Window.partitionBy("entityId").orderBy(F.col("trn_time").cast("long").asc(),"X")\
.rangeBetween(-days(120),-days(1))
i get the following error : "A range window frame with value boundaries cannot be used in a window specification with multiple order by expressions"
how can i implement this logic ?