I have a pandas DataFrame
that looks like this:
record_date userid id priority
1 2016-05-27 02:00:39.600 1rhNGfQjU6 2718376 3
2 2016-05-27 02:00:39.600 EveMoYR1gs 2718377 3
3 2016-05-27 02:00:39.600 iVYGQgU3bX 2718378 3
4 2016-05-27 02:00:39.600 adA9fRNIgo 2718379 3
5 2016-05-27 02:00:39.600 rCDTlqTOXB 2718380 3
6 2016-05-27 02:00:39.600 aBI6JkLyal 2718381 3
7 2016-05-27 02:00:39.600 eiEct977ua 2718382 3
8 2016-05-27 02:00:39.600 7XVMWZPcZL 2718383 3
9 2016-05-27 02:00:39.600 GHajQM9UXN 2718384 3
It's not evident here, but there can be more than one record per user per day. I am trying to find a way to identify the id
that corresponds to the lowest priority value per user per day. I think I may be having a problem with tie breaking because I tried suggestions from another SO post (Python : Getting the Row which has the max value in groups using groupby) but that logic would seem to select all records equal to the min, whereas I really need just one (in that case randomly chosen) record with the min priority per user. I know the code above isn't getting that for me because
len(set(df[indices]['userid'])) == len(df[indices]['userid'])
is False. What's the best way to achieve this? I understand why the code above doesn't work (since it returns True
for those records equal to the min). What's a good way to break the tie?