I have run an ALS model, and am working with the output recommendations. They are currently in this format:
|user_id|item_ids| ratings |
| 72 | [5,2,3]| [1.2,3.6,4.1]|
| 16 | [1,9,5]| [4.4,2.0,3.9]|
Each row contains a list in the "item_ids" column and a list in the "ratings" column.
What I'd like to do is "unpivot" each list in "item_ids" and "ratings" so that it looks like this:
|user_id|item_ids|ratings|
| 72 | 5 | 1.2 |
| 72 | 2 | 3.6 |
| 72 | 3 | 4.1 |
| 16 | 1 | 4.4 |
| 16 | 9 | 2.0 |
| 16 | 5 | 3.9 |
The only way I have found to do this is doing so one user at a time by filtering for one user_id, converting the lists to columns in a Pandas dataframe, and then round-tripping that back to a Spark dataframe. As you might imagine, that would take an eternity to complete with a few million user_ids. I have also considered using the "stack" function described here, but the manual inputs required would also take an eternity. I have also considered the code found here, but it doesn't work when lists are involved.
Is there a more efficient way to do this without having to lean on Pandas? I'm running Pyspark (Spark 2.2.0) on Databricks.