1

I have a dataframe like the one below:

import pandas as pd
df = pd.DataFrame({"firstname": ["John", "Myla", "Lewis", "John", "Myla"],
                   "lastname": ['Smith', 'Anderson', 'Werner', 'Smith', 'Lewis'],
                   "ignore_var": [24, np.nan, 21, 99, 26]})

I am trying to see based only in the first two columns (firstname and lastname) how many times each author appears. In my example, everyone including Myla appears one time except for John who appears 2 time. I would like to add this as df['count']

df.groupby(['firstname', 'lastname']).size()
# there is also count() in case it is preferred

I manage to do the aggregation and the count. How can I lastly merge it to the original dataframe?

I would like to keep all rows and simply add it as an extra column.

G. Macia
  • 1,204
  • 3
  • 23
  • 38
  • 2
    `df['count'] = df.groupby(['firstname', 'lastname'])['firstname'].transform('size')` – Quang Hoang Feb 16 '21 at 17:42
  • exactly what I needed! Can you explain the solution, i.e. further elaborate and I will accept it. I would like to understand it and learn – G. Macia Feb 16 '21 at 17:43
  • 2
    There should be quite a few questions on SO on `groupby().transform()`. Also it should be straightforward from [the official document](https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.transform.html). – Quang Hoang Feb 16 '21 at 17:45

0 Answers0