3

I've been performing a groupby operation on a dataframe I have that aggregates columns together based on the column 'Name':

Name | As | Bs | Cs   |  Note
Mark   3     4     7     Good
Luke   2     1     12    Well
Mark   5     6     8     Ok
John   1     18    3     Great

So in this case, the rows with 'Mark' are aggregated together on columns A,B, and C using this code:

temp_df = temp_df.groupby(['Name'], as_index=False).agg({'As': np.sum, 'Bs': np.sum,'Cs': np.sum})

A thing I need to add in is to do a count on the number of rows that have the same value in 'Name'. This would give me an output like:

Name | As | Bs | Cs   |  Note   | Count
Mark   8     10    15    Good      2
Luke   2     1     12    Well      1
John   1     18    3     Great     1

How do I modify the above line of code to do what I need?

GreenGodot
  • 6,030
  • 10
  • 37
  • 66

2 Answers2

4

Create the group and do your aggregations:

the_group = temp_df.groupby(['Name'], as_index=False)
temp_df = the_group.agg({'As': np.sum, 'Bs': np.sum,'Cs': np.sum})

then compute the size from the_group

temp_df['count'] = the_group.count()['Note']

gives:

   Name  Cs  As  Bs  count
0  John   3   1  18      1
1  Luke  12   2   1      1
2  Mark  15   8  10      2

Edit:

As suggested in the comments, it is safer to use size() in case the data include NaN:

temp_df['count'] = the_group.size().reset_index()[0] 
VinceP
  • 2,058
  • 2
  • 19
  • 29
  • I believe you meant to say 'temp_df['count'] = the_group.count()['Name']'? WHich is what I want to get. THis does not seem to be working though, I'm just getting back the string value for name for each row. – GreenGodot Jul 06 '17 at 16:02
  • Count is used for check non NaN values, need size. – jezrael Jul 06 '17 at 16:27
2

Use first + size and then is necessary rename columns by dict:

temp_df = temp_df.groupby('Name', sort=False) \
                .agg({'As':np.sum,'Bs':np.sum,'Cs':np.sum,'Note':'first','Name':'size'}) \
                .rename(columns={'Name':'Count'}) \
                .reset_index() \
                .reindex_axis(temp_df.columns.tolist() + ['Count'], axis=1) 
print (temp_df)
   Name  As  Bs  Cs   Note  Count
0  Mark   8  10  15   Good      2
1  Luke   2   1  12   Well      1
2  John   1  18   3  Great      1

Dont use count, only size or len.

What is the difference between size and count in pandas?

Graham
  • 7,431
  • 18
  • 59
  • 84
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 1
    Since pandas 0.25.0+ you can use named aggregation. temp_df.groupby('Name', sort=False).agg(As=('As','sum'),Bs=('Bs','sum'),Cs=('Cs','sum'),Note=('Note','first'),Count=('Name','size')) – Matthew Son Jan 16 '20 at 21:12