I have a Dataframe which records the rank of item for each user, where there can be ties in rank. For those user-item pairs with ties, I want to break ties randomly for each user when select top k entries.
e.g. In this example, k = 3.
input:
+-------+-------+----+
| user | item |rank|
+-------+-------+----+
| 1| 1| 1|
| 1| 2| 2|
| 2| 1| 1|
| 2| 3| 1|
| 2| 2| 2|
| 2| 4| 2|
| 3| 2| 1|
| 3| 4| 1|
| 3| 1| 2|
| 3| 3| 2|
+-------+-------+----+
One desired output is like:
+-------+-------+----+
| user | item |rank|
+-------+-------+----+
| 1| 1| 1|
| 1| 2| 2|
| 2| 1| 1|
| 2| 3| 1|
| 2| 2| 2|
| 3| 2| 1|
| 3| 4| 1|
| 3| 1| 2|
+-------+-------+----+
Or below is good, and so does another two combinations (not listed here)
+-------+-------+----+
| user | item |rank|
+-------+-------+----+
| 1| 1| 1|
| 1| 2| 2|
| 2| 1| 1|
| 2| 3| 1|
| 2| 4| 2|
| 3| 2| 1|
| 3| 4| 1|
| 3| 1| 2|
+-------+-------+----+
Browsed all spark.sql.functions, didn't find anything quite useful, nor from Google.
Any help is appreciated!