I have a df of the form (just a sample):
Name Event Score
Joe 1 0.9
Tim 8 0.4
Ben 3 0.7
Ben 9 0.1
Tim 2 0.6
Ben 5 1.1
Joe 2 1.2
Tim 6 0.9
Tim 3 0.7
Joe 7 0.6
Joe 6 0.8
My first step was to groupby and select the 2 smallest scores for each person. I was able to do this with the following code:
top_2 = df.groupby('Name')['Score'].apply(lambda x: x.nsmallest(2)).reset_index('Name')
df = df[(df.index.isin(top_2.index))].reset_index(drop=True)
Which gives the results:
Name Event Score
Tim 8 0.4
Ben 3 0.7
Ben 9 0.1
Tim 2 0.6
Joe 7 0.6
Joe 6 0.8
However, now I would like to make sure that each person does not have consecutive events. For example, Joe has events 6 and 7. Instead, I would like to choose event 7 (since it is smaller than 6) and event 1 (Joe's next smallest event, so that he still has 2 events).
I believe I will need to do this as a step prior to my current code but I'm unsure how to approach it.
A couple of notes:
- Events are not repeated for a person
- Names are unique
- The number of events for a person in the raw data could be anywhere > 1