I have a dataframe of several accounts that display different modes of animal categories. How can I identify the accounts that have more than 1 mode?
For example, note that account 3 only has one mode (i.e. "dog"), but accounts 1, 2 and 4 have multiple modes (i.e more than one mode).
test = pd.DataFrame({'account':[1,1,1,2,2,2,2,3,3,3,3,4,4,4,4],
'category':['cat','dog','rabbit','cat','cat','dog','dog','dog','dog','dog','rabbit','rabbit','cat','cat','rabbit']})
The expected output I'm looking for would be something like this:
pd.DataFrame({'account':[1,2,4],'modes':[3,2,2]})
Secondary to this, I am then trying to take any random highest mode for all accounts having multiple modes. I have come up with the following code, however, this only returns the first (alphabetical) mode for each account. My intuition tells me something could be written within the iloc
brackets below, perhaps a a random array between 0 and the total number of modes, but I'm unable to fully get there.
test.groupby('account')['category'].agg(lambda x: x.mode(dropna=False).iloc[0])
Any suggestions? Thanks much.