Finding the most frequent strings and their counts for each group using pandas

Question

I'm trying to find the name of the person who submitted the most applications in any given year over a series of years.

Each application is its own row in the dataframe. It comes with the year it was submitted, and the applicant's name.

I tried using groupby to organize the data by year and name, then a variety of methods such as value_counts(), count(), max(), etc...

This is the closest I've gotten:

df3.groupby(['app_year_start'])['name'].value_counts().sort_values(ascending=False)

It produces the following output:

app_year_start        name               total_apps
2015                  John Smith         622
2013                  John Smith         614
2014                  Jane Doe           611
2016                  Jon Snow           549

My desired output:

app_year_start        name                  total_apps
2015                  top_applicant         max_num
2014                  top_applicant         max_num
2013                  top_applicant         max_num
2012                  top_applicant         max_num

Some lines of dummy data:

app_year_start        name
2012                  John Smith
2012                  John Smith
2012                  John Smith
2012                  Jane Doe
2013                  Jane Doe
2012                  John Snow
2015                  John Snow
2014                  John Smith
2015                  John Snow
2012                  John Snow
2012                  John Smith
2012                  John Smith
2012                  John Smith
2012                  John Smith
2012                  Jane Doe
2013                  Jane Doe
2012                  John Snow
2015                  John Snow
2014                  John Smith
2015                  John Snow
2012                  John Snow
2012                  John Smith

I've consulted the follow SO posts:

Get statistics for each group (such as count, mean, etc) using pandas GroupBy?

Pandas groupby nlargest sum

Get max of count() function on pandas groupby objects

Some other attempts I've made:

df3.groupby(['app_year_start'])['name'].value_counts().sort_values(ascending=False)

df3.groupby(['app_year_start','name']).count()

Any help would be appreciated. I'm also open to entirely different solutions as well.

score 1 · Accepted Answer · answered Jul 20 '22 at 01:39

Cross-tabulate and find max values.

(
    # cross tabulate to get each applicant's number of applications
    pd.crosstab(df['app_year_start'], df['name'])
    # the applicant with most applications and their counts
    .agg(['idxmax', 'max'], 1)
    # change column names
    .set_axis(['name','total_apps'], axis=1)
    # flatten df
    .reset_index()
)

mozway · Answer 2 · 2022-07-20T01:52:54.403

You can use mode per group:

df.groupby('app_year_start')['name'].agg(lambda x: x.mode().iloc[0])

Or, if you want all values joined as a single string in case of a tie:

df.groupby('app_year_start')['name'].agg(lambda x: ', '.join(x.mode()))

Output:

app_year_start
2012    John Smith
2013      Jane Doe
2014    John Smith
2015     John Snow
Name: name, dtype: object

Variant of your initial code:

(df
 .groupby(['app_year_start', 'name'])['name']
 .agg(total_apps='count')
 .sort_values(by='total_apps', ascending=False)
 .reset_index()
 .groupby('app_year_start', as_index=False)
 .first()
 )

Output:

   app_year_start        name  total_apps
0            2012  John Smith           8
1            2013    Jane Doe           2
2            2014  John Smith           2
3            2015   John Snow           4

score 1 · Answer 3 · answered Jul 20 '22 at 05:11

With value_counts and a groupby:

dfc = (df.value_counts().reset_index().groupby('app_year_start').max()
          .sort_index(ascending=False).reset_index() 
          .rename(columns={0:'total_apps'})
      )

print(dfc)

Result

   app_year_start        name  total_apps
0            2015   John Snow           4
1            2014  John Smith           2
2            2013    Jane Doe           2
3            2012   John Snow           8

Finding the most frequent strings and their counts for each group using pandas

3 Answers3