I have a Spark dataframe that looks as follows,it filled with sparse Vector but not dense Vector:
+---+--------+-----+-------------+
|id |catagery|index|vec |
+---+--------+-----+-------------+
|a |ii |3.0 |(5,[3],[1.0])|
|a |ll |0.0 |(5,[0],[1.0])|
|b |dd |4.0 |(5,[4],[1.0])|
|b |kk |2.0 |(5,[2],[1.0])|
|b |gg |5.0 |(5,[],[]) |
|e |hh |1.0 |(5,[1],[1.0])|
+---+--------+-----+-------------+
as we all know,if i try like this
val rr=result.groupBy("id").agg(sum("index"))
scala> rr.show(false)
+---+----------+
|id |sum(index)|
+---+----------+
|e |1.0 |
|b |11.0 |
|a |3.0 |
+---+----------+
but how can I use "groupBy" and "agg" to sum Sparse Vector? I want the final dataFrame like this:
+---+-------------------------+
|id | vecResult |
+---+-------------------------+
|a |(5,[0,3],[1.0,1.0]) |
|b |(5,[2,4,5],[1.0,1.0,1.0])|
|e |(5,[1],[1.0]) |
+---+-------------------------+
I think VectorAssembler() may solve this, but I don't know how to write code, should I use udf?