I have a DataFrame of orders (contactidid, orderdate, orderamount) and I want a new column that contains, for each order, the sum of all order amounts for the contact for the 12 months prior to this order. I am thinking the best way is to use the Windowing functions and the new INTERVAL ability in Spark >1.5.
But I'm having difficulty making this work or finding documentation. My best guess is:
val dfOrdersPlus = dfOrders
.withColumn("ORDERAMOUNT12MONTH",
expr("sum(ORDERAMOUNT) OVER (PARTITION BY CONTACTID ORDER BY ORDERDATE RANGE BETWEEN INTERVAL 12 months preceding and INTERVAL 1 day preceding)"));
But I get a RuntimeException: 'end of input expected'. Any ideas of what I am doing wrong with this 'expr' and where I could find documentation on the new INTERVAL literals?