I have a PySpark DataFrame with one column as one hot encoded vectors. I want to aggregate the different one hot encoded vectors by vector addition after groupby
e.g. df[userid,action] Row1: ["1234","[1,0,0]] Row2: ["1234", [0 1 0]]
I want the output as row: ["1234", [ 1 1 0]]
so the vector is a sum of all vectors grouped by userid
.
How can I achieve this? PySpark sum aggregate operation does not support the vector addition.