I have the same problem as asked here but I need a solution in pyspark and without breeze.
For example if my pyspark dataframe look like this:
user | weight | vec
"u1" | 0.1 | [2, 4, 6]
"u1" | 0.5 | [4, 8, 12]
"u2" | 0.5 | [20, 40, 60]
where column weight has type double and column vec has type Array[Double], I would like to get the weighted sum of the vectors per user, so that I get a dataframe that look like this:
user | wsum
"u1" | [2.2, 4.4, 6.6]
"u2" | [10, 20, 30]
To do this I have tried the following:
df.groupBy('user').agg((F.sum(df.vec* df.weight)).alias("wsum"))
But it failed as the vec column and weight columns have different types.
How can I solve this error without breeze?