I have a dataframe which I named parking which has multiple columns, in this case Registration State, Violation Code, and Summons Number.
For each Registration State, I want the 3 Violation Codes which the highest row count. The best I've been able to get is:
parking_state_group = parking.groupby(['Registration State', 'Violation Code'])['Summons Number'].count()
When printed (i.e. print(parking_state_group.reset_index()) looks like:
Registration State Violation Code Summons Number
0 99 0 14
1 99 6 1
2 99 10 6
3 99 13 2
4 99 14 75
... ... ... ...
1811 WY 37 3
1812 WY 38 4
1813 WY 40 4
1814 WY 46 1
1815 WY 68 1
This at least gets me the count of each Violation Code for each state (Summons Number is like an ID field for each row). I want this to return only the 3 violation codes for each state with the highest count, so something like:
Registration State Violation Code Summons Number
0 99 14 75
1 99 31 61
2 99 87 55
... ... ... ...
1812 WY 38 4
1813 WY 40 4
1811 WY 37 3
I've tried .nlargest() but this doesn't seem to get the largest .count(), only the largest values within a column, which isn't what I'm looking for.