I have a spark dataframe with an array column that looks like this:
+--------------+
| x |
+--------------+
| [1, 1, 0, 1] |
| [0, 0, 0, 0] |
| [0, 0, 1, 1] |
| [0, 0, 0, 1] |
| [1, 0, 1] |
+--------------+
I want to add a new column with another array that contains the cumulative sum of x
at each index. The result should look like this:
+--------------+---------------+
| x | x_running_sum |
+--------------+---------------+
| [1, 1, 0, 1] | [1, 2, 2, 3] |
| [0, 0, 0, 0] | [0, 0, 0, 0] |
| [0, 0, 1, 1] | [0, 0, 1, 2] |
| [0, 0, 0, 1] | [0, 0, 0, 1] |
| [1, 0, 1] | [1, 1, 2] |
+--------------+---------------+
How can I create the x_running_sum
column? I've tried using some of the higher order functions like transform, aggregate, and zip_with, but I haven't found a solution yet.