Here is a small example:
import pandas as pd
category_list_1 = ['Album','Album','Album','Album','Album','Footballer','Footballer','Footballer']
value_list_1 = ['Alligator','Alligator','Cherry Tree','Cherry Tree','Cherry Tree','Nolberto Solano','Nolberto Solano', 'Laurent Robert']
df_1 = pd.DataFrame({'Category':category_list_1,'Value':value_list_1})
category_list_2 = ['Album','Album','Album','Album','Footballer','Footballer','Footballer']
value_list_2 = ['Alligator','Alligator','Cherry Tree','Cherry Tree','Nolberto Solano','Nolberto Solano', 'Laurent Robert']
df_2 = pd.DataFrame({'Category':category_list_2,'Value':value_list_2})
df_1_agg = df_1.groupby(['Category'])['Value'].agg(pd.Series.mode)
print(df_1_agg)
df_2_agg = df_2.groupby(['Category'])['Value'].agg(pd.Series.mode)
print(df_2_agg)
df_1_agg
works fine as there is a true modal value for each category. But for df_2_agg
, I would like it to return either value for the modal Album. However, I instead get the error:
Exception: Must produce aggregated value
I am able to use lambda functions as a workaround such as:
df_2_agg = df_2.groupby(['Category']).agg(lambda x:x.value_counts().index[0])
print(df_2_agg)
However I imagine this may be significantly slower for larger datasets. Is there any way to generate this type of output within pandas?