I'm trying to convert values from row into different columns with its value from another column. For example -
Input dataframe is like -
+-----------+
| X | Y | Z |
+-----------+
| 1 | A | a |
| 2 | A | b |
| 3 | A | c |
| 1 | B | d |
| 3 | B | e |
| 2 | C | f |
+-----------+
And the output dataframe should be like this -
+------------------------+
| Y | 1 | 2 | 3 |
+------------------------+
| A | a | b | c |
| B | d | null | e |
| C | null | f | null |
+------------------------+
I've tried to groupBy the values based on Y and collect_list on X and Z and then zipped X & Z together to get some sort of key-value pairs. But some Xs may be missing for some values of Y so in order to fill them with null values, I cross joined all possible values of X and all possible values of Y and then joined it my original dataframe. This is approach is highly inefficient.
Is there any efficient method to approach this problem ? Thanks in advance.