I have a dataset like this one:
+----------+------------+-----+
|id |event |next |
+----------+------------+-----+
| 1 |A |X |
| 2 |B |Y |
| 3 |C |Z |
| 4 |C |X |
| 5 |A |X |
| 6 |D |Y |
| 7 |B |Y |
+----------+------------+-----+
I would like to count how have the same value in both the column "event" and the column "next" and add another column with that count. Then I would like to keep only one such row and delete the other rows.
+----------+------------+-----+-------+
|id |event |next | count |
+----------+------------+-----+-------+
| 1 |A |X |2 |
| 2 |B |Y |2 |
| 3 |C |Z |1 |
| 4 |C |X |1 |
| 6 |D |Y |1 |
+----------+------------+-----+-------+
How could I do this in Pyspark? Thank you!