0

Here is a small example:

import pandas as pd

category_list_1 = ['Album','Album','Album','Album','Album','Footballer','Footballer','Footballer']
value_list_1 = ['Alligator','Alligator','Cherry Tree','Cherry Tree','Cherry Tree','Nolberto Solano','Nolberto Solano', 'Laurent Robert']

df_1 = pd.DataFrame({'Category':category_list_1,'Value':value_list_1})


category_list_2 = ['Album','Album','Album','Album','Footballer','Footballer','Footballer']
value_list_2 = ['Alligator','Alligator','Cherry Tree','Cherry Tree','Nolberto Solano','Nolberto Solano', 'Laurent Robert']

df_2 = pd.DataFrame({'Category':category_list_2,'Value':value_list_2})


df_1_agg = df_1.groupby(['Category'])['Value'].agg(pd.Series.mode)
print(df_1_agg)
df_2_agg = df_2.groupby(['Category'])['Value'].agg(pd.Series.mode)
print(df_2_agg)

df_1_agg works fine as there is a true modal value for each category. But for df_2_agg, I would like it to return either value for the modal Album. However, I instead get the error:

Exception: Must produce aggregated value

I am able to use lambda functions as a workaround such as:

df_2_agg = df_2.groupby(['Category']).agg(lambda x:x.value_counts().index[0])
print(df_2_agg)

However I imagine this may be significantly slower for larger datasets. Is there any way to generate this type of output within pandas?

David Jacques
  • 179
  • 2
  • 9
  • you would need to return the first value i think: `df_2_agg = df_2.groupby(['Category'])['Value'].agg(lambda x:x.mode().iat[0])` – anky Mar 10 '20 at 17:59
  • 1
    Using `value_counts` takes about twice the time it takes to use `pd.Series.mode`, so if you don't expect your script to take minutes, it shouldn't be an issue. I'm not aware of a more "pandaic" approach to your problem. – Juan C Mar 10 '20 at 18:03
  • @anky_91, my post is asking about how I could go about performing a mode without using lambda functions. I can't see how it's a duplicate of the question you marked it as (https://stackoverflow.com/questions/48645354/obtain-mode-from-column-in-groupby). You've in fact appeared to just repeat the last code snippet from within my question – David Jacques Mar 10 '20 at 18:28
  • @DaveJay I see , however the post which I have linked you to also has the am=nswer which I have commented, which would be faster than value counts – anky Mar 10 '20 at 18:30
  • @anky_91 but it still uses a lambda function? I've updated the title of my question to be more explicit – David Jacques Mar 10 '20 at 18:31
  • 1
    How does it matter when you have no option but to use apply/agg anyway? If you still want me to reopen the question No problem – anky Mar 10 '20 at 18:33
  • as a workaround can you try `m = df_2.groupby(['Category'])['Value'].value_counts().rename('Count')` and then `m.loc[m.groupby(level=0).idxmax()].reset_index().drop('Count',1)` – anky Mar 10 '20 at 19:02

0 Answers0