Assume I have a Pyspark dataframe as shown below. Each user bought one item on some specific date.
+--+-------------+-----------+
|ID| Item Bought| Date |
+--+-------------+-----------+
|1 | Laptop | 01/01/2018|
|1 | Laptop | 12/01/2017|
|1 | Car | 01/12/2018|
|2 | Cake | 02/01/2018|
|3 | TV | 11/02/2017|
+--+-------------+-----------+
Now I would like to create a new data frame as shown below.
+---+--------+-----+------+----+
|ID | Laptop | Car | Cake | TV |
+---+--------+-----+------+----+
|1 | 2 | 1 | 0 | 0 |
|2 | 0 | 0 | 1 | 0 |
|3 | 0 | 0 | 0 | 1 |
+---+--------+-----+------+----+
There are item columns, each column for one item. For each user, the number on each column is the number of that items user bought.