I am working on creating a dataframe from a XML file using Spark in python. What I want to do is converting value in each row into new column and making dummy variable.
Here is the example.
Input:
id | classes |
-----+--------------------------+
132 | economics,engineering |
201 | engineering |
123 | sociology,philosophy |
222 | philosophy |
--------------------------------
Output:
id | economics | engineering | sociology | philosophy
-----+-----------+-------------+-----------+-----------
132 | 1 | 1 | 0 | 0
201 | 0 | 1 | 0 | 0
123 | 0 | 0 | 1 | 1
222 | 0 | 0 | 0 | 1
--------------------------------------------------------