I need to implement the lag function in spark; which I was able to do like below (with some data from hive/temp spark table)
Say the DF has these rows:
lagno:value
0, 100
0, 200
2, null
3, null
where the first column is the actual lag number which you want to use, and the second column is actual value.
When I run this query it works:
DataFrame df;
DataFrame dfnew=df.select(
org.apache.spark.sql.functions.lag( df.col("value"), 1 ).over(org.apache.spark.sql.expressions.Window.orderBy(new1.col("value"))));
that means if hard code the value of lag no, it works fine.
However, if I pass the lag value as a parameter it's not working:
DataFrame dfnew=df.select(
org.apache.spark.sql.functions.lag( df.col("value"),df.col("lagno").over(org.apache.spark.sql.expressions.Window.orderBy(new1.col("value"))));
Do I need to type cast the parameter of col type to integer?