3

Coming from a SQL environment, I am learning some things in Python Pandas. I have a question regarding grouping and aggregates.

Say I group a dataset by Age Category and count the different categories. In MSSQL I would write this:

SELECT AgeCategory, COUNT(*) AS Cnt
FROM TableA
GROUP BY AgeCategory
ORDER BY 1

The result set is a 'normal' table with two columns, the second column I named Count.

When I want to do the equivalent in Pandas, the groupby object is different in format. So now I have to reset the index and rename the column in a following line. My code would look like this:

grouped = df.groupby('AgeCategory')['ColA'].count().reset_index()
grouped.columns = ['AgeCategory', 'Count']
grouped

My question is if this can be accomplished in one go. Seems like I am over-doing it, but I lack experience.

Thanks for any advise.

Regards, M.

Ismael Padilla
  • 5,246
  • 4
  • 23
  • 35
SQL_M
  • 2,455
  • 2
  • 16
  • 30
  • Possible duplicate of [Pandas create new column with count from groupby](https://stackoverflow.com/questions/29836477/pandas-create-new-column-with-count-from-groupby) – AJS Apr 05 '19 at 09:02
  • Related: https://stackoverflow.com/a/55435185, titled "**Count ALL Rows per Group (Series/DataFrame): GroupBy.size**" (getting count in groupby, count v/s size... scroll down) – cs95 Apr 05 '19 at 09:44

1 Answers1

2

Use parameter name in DataFrame.reset_index:

grouped = df.groupby('AgeCategory')['ColA'].count().reset_index(name='Count')

Or:

grouped = df.groupby('AgeCategory').size().reset_index(name='Count')

Difference is GroupBy.count exclude missing values, GroupBy.size not.


More information about aggregation in pandas.

jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252