I have:
from pyspark.sql import functions as F
from pyspark.sql.window import Window
df = spark.createDataFrame([(17, "2017-03-10T15:27:18+00:00",'Store 1'),
(13, "2017-04-15T12:27:18+00:00",'Store 1'),
(25, "2017-05-18T11:27:18+00:00",'Store 1'),
(18, "2017-05-19T11:27:18+00:00",'Store 1'),
(13, "2017-03-15T12:27:18+00:00",'Store 2'),
(25, "2017-05-18T11:27:18+00:00",'Store 2'),
(25, "2017-08-18T11:27:18+00:00",'Store 2')],
["dollars", "timestampGMT",'Store'])
df = df.withColumn('timestampGMT', df.timestampGMT.cast('timestamp'))
dollars timestampGMT Store
17 2017-03-10 15:27:18 Store 1
13 2017-04-15 12:27:18 Store 1
25 2017-05-18 11:27:18 Store 1
18 2017-05-19 11:27:18 Store 1
13 2017-03-15 12:27:18 Store 2
25 2017-05-18 11:27:18 Store 2
25 2017-08-18 11:27:18 Store 2
I want to average by the last 3 months (if last 3 months are present, else 0), grouping by Store. Ending up with:
dollars timestampGMT Store Last_3_months_Average
17 2017-03-10 15:27:18 Store 1 0
13 2017-04-15 12:27:18 Store 1 0
25 2017-05-18 11:27:18 Store 1 18.25
18 2017-05-19 11:27:18 Store 1 18.25
13 2017-03-15 12:27:18 Store 2 0
25 2017-05-18 11:27:18 Store 2 0
25 2017-08-18 11:27:18 Store 2 0
25 2017-08-19 11:27:18 Store 2 0
Not sure how to approach this problem. Should I group by month first?