0

how to get a new column for the majority of each group?

sample dataframe

    who        state
0   peopleA     CA
1   peopleA     CA
2   peopleA     CA
3   peopleA     NJ
4   peopleB     FL
5   peopleB     FL
6   peopleB     CA

this's not the right way to code

df['new_column'] = df.groupby('who').mode()

my expected output

    who        new_column
0   peopleA      CA
1   peopleB      FL

Bonus question: Is there a way to set a threshold to control if the count greater than 70%, then consider as majority, if less than 70%, return null

Learn
  • 528
  • 2
  • 6
  • 18

2 Answers2

2

pandas.groupby does not have mode(). A workaround is here. Basically after grouping, you can use .apply() to the grouper and find mode specific to that group.

df.groupby('who').state.apply(lambda x: x.mode()).reset_index(0)

Output:

      who   state
0   peopleA CA
0   peopleB FL
harvpan
  • 8,571
  • 2
  • 18
  • 36
2

We can group by column who, then apply mode function on the df groupby object and then call reset_index and pass param drop=True so that the multi-index is not added back as a column

>>> df
       who state
0  peopleA    CA
1  peopleA    CA
2  peopleA    CA
3  peopleA    NJ
4  peopleB    FL
5  peopleB    FL
6  peopleB    CA
>>> 
>>> df.groupby('who').apply(pd.DataFrame.mode).reset_index(drop=True)
       who state
0  peopleA    CA
1  peopleB    FL
>>> 
Sunitha
  • 11,777
  • 2
  • 20
  • 23
  • Is there a way to set a threshold to control if the count greater than 70%, then consider as majority, if less than 70%, return null? – Learn Jun 18 '18 at 23:48
  • how to specify the column's name for mode function if I have more that two columns? – Learn Jun 19 '18 at 00:04