Is there a way to use Pandas pd.Series.mode method and accept any winning value if there is a tie (and avoid using lambda fucntions)?

Question

Here is a small example:

import pandas as pd

category_list_1 = ['Album','Album','Album','Album','Album','Footballer','Footballer','Footballer']
value_list_1 = ['Alligator','Alligator','Cherry Tree','Cherry Tree','Cherry Tree','Nolberto Solano','Nolberto Solano', 'Laurent Robert']

df_1 = pd.DataFrame({'Category':category_list_1,'Value':value_list_1})


category_list_2 = ['Album','Album','Album','Album','Footballer','Footballer','Footballer']
value_list_2 = ['Alligator','Alligator','Cherry Tree','Cherry Tree','Nolberto Solano','Nolberto Solano', 'Laurent Robert']

df_2 = pd.DataFrame({'Category':category_list_2,'Value':value_list_2})


df_1_agg = df_1.groupby(['Category'])['Value'].agg(pd.Series.mode)
print(df_1_agg)
df_2_agg = df_2.groupby(['Category'])['Value'].agg(pd.Series.mode)
print(df_2_agg)

df_1_agg works fine as there is a true modal value for each category. But for df_2_agg, I would like it to return either value for the modal Album. However, I instead get the error:

Exception: Must produce aggregated value

I am able to use lambda functions as a workaround such as:

df_2_agg = df_2.groupby(['Category']).agg(lambda x:x.value_counts().index[0])
print(df_2_agg)

However I imagine this may be significantly slower for larger datasets. Is there any way to generate this type of output within pandas?

you would need to return the first value i think: `df_2_agg = df_2.groupby(['Category'])['Value'].agg(lambda x:x.mode().iat[0])` — anky, Mar 10 '20 at 17:59
Using `value_counts` takes about twice the time it takes to use `pd.Series.mode`, so if you don't expect your script to take minutes, it shouldn't be an issue. I'm not aware of a more "pandaic" approach to your problem. — Juan C, Mar 10 '20 at 18:03
@anky_91, my post is asking about how I could go about performing a mode without using lambda functions. I can't see how it's a duplicate of the question you marked it as (https://stackoverflow.com/questions/48645354/obtain-mode-from-column-in-groupby). You've in fact appeared to just repeat the last code snippet from within my question — David Jacques, Mar 10 '20 at 18:28
@DaveJay I see , however the post which I have linked you to also has the am=nswer which I have commented, which would be faster than value counts — anky, Mar 10 '20 at 18:30
@anky_91 but it still uses a lambda function? I've updated the title of my question to be more explicit — David Jacques, Mar 10 '20 at 18:31
How does it matter when you have no option but to use apply/agg anyway? If you still want me to reopen the question No problem — anky, Mar 10 '20 at 18:33
as a workaround can you try `m = df_2.groupby(['Category'])['Value'].value_counts().rename('Count')` and then `m.loc[m.groupby(level=0).idxmax()].reset_index().drop('Count',1)` — anky, Mar 10 '20 at 19:02

Is there a way to use Pandas pd.Series.mode method and accept any winning value if there is a tie (and avoid using lambda fucntions)?

0 Answers0