I have client , timestamp and all as columns and i need to achieve a column named "required"
The required column is result of difference of current row and previous row value of "all" column and the list element of current row.
However, the the result of current row will should be used as previous row for calculating the difference between next column. How can I get previous row calculated value in the next row using spark Scala. I used below udf to achieve.
+--------------+-------------------+--------------------------------------------------+---------------------------------------------
|CLIENT_ID |timestamp |all |Required
+--------------+-------------------+--------------------------------------------------+--------------------------------------------
|69415092|2002-03-15 00:00:00|[[06,718], [07,718]] |[[06,718], [07,718]]
|69415092|2002-03-19 00:00:00|[[10,718]] |[[06,718], [07,718],[10,718]]
|69415092|2002-03-22 00:00:00|[[06,223],[12,718]] |[[07,718],[10,718],[12,718],[06,223]]
|69415092|2002-11-16 00:00:00|[[12,386]] |[[07,718],[10,718],[06,223],[12,386]]
But the calculated value is not updated in the existing column.
val window = Window.partitionBy("CLIENT_ID").orderBy("timestamp")
def fun1(s1: Seq[String],s2: Seq[String]): Seq[String] = {
var un= s2.diff(s1)
if( un.contains("0") || un.isEmpty){
un=s1
}
else{
var a = un.toArray
un =concat(a,s1.toArray)
}
un
}
val funUdf = udf(fun1 _)
var uniondf = df3.withColumn("Required", funUdf("all",lag("all", 1, Array("0")).over(window))).select("CLP_CLIENT_ID","timestamp","all","Required")
uniondf.show(false)