I have a table that looks like this:
HorseID RiderID Total
121 1 2
5 1 150
600 1 20
30 2 500
5 3 10
600 3 10
34 3 10
Each horse can potentially have multiple riders, and visa versa. 'Total' refers to the total number of incidents with that combination of rider and horse. What I would like is a table with the top 2 HorseIDs in terms of total for each rider, deduplicated. (ie. the top horse for rider 1 was horse 5, as they have had 150 incidents together, compared with 20 and 2 for the other horses with rider 1). So what I would like as output here is:
HorseID
5
600
30
Note when there is less than 2 horses for a rider, it should just grab the top 1 instead. When it comes to rider 3, there is a draw so it picks the top 2 in terms of order - however, these are already on the list so are not added (the deduplicate part - also happy to deduplicate afterward in a separate step if that's faster). I wouldn't mind if this instead grabbed number 34 in a draw (so we had an extra horseid on the list), any method of resolving draws is fine.
This is a toy dataset, the real dataset is much bigger and I will be grabbing the top 200 for each RiderID, so a scale-able/efficient solution would be great.
How would I go about this?