Lets say i have a dataframe like this:
+--------------------+--------------------+
| Fruits| Count|
+--------------------+--------------------+
|[Pear, Orange] |[1,2] |
+--------------------+--------------------+
|[Orange, Pear] |[2,1] |
+--------------------+--------------------+
|[Orange, Pear] |[2,1] |
+--------------------+--------------------+
I want another column with the merged info
+--------------------+------------+----------------------------+
| Fruits| Count | merged |
+--------------------+------------+----------------------------+
|[Pear, Orange] |[1,2] |[('Pear',1),('Orange',2)] |
+--------------------+-----------------------------------------+
|[Pear, Orange] |[2,1] |[('Pear',2),('Orange',1)] |
+--------------------+-----------------------------------------+
|[Orange, Pear] |[2,1] |[('Pear',1),('Orange',2)] |
+--------------------+-----------------------------------------+
I showed the 3rd row cos im hoping my merged column can be first create into tuple and then sorted.
Is there a function in PySpark that can do this?
I know we can merge cols through here: pyspark - merge 2 columns of sets but i want it merged on a dictionary approach rather than concat..