How to count how many times a pandas row appears based only on certain columns?

Question

I have a dataframe like the one below:

import pandas as pd
df = pd.DataFrame({"firstname": ["John", "Myla", "Lewis", "John", "Myla"],
                   "lastname": ['Smith', 'Anderson', 'Werner', 'Smith', 'Lewis'],
                   "ignore_var": [24, np.nan, 21, 99, 26]})

I am trying to see based only in the first two columns (firstname and lastname) how many times each author appears. In my example, everyone including Myla appears one time except for John who appears 2 time. I would like to add this as df['count']

df.groupby(['firstname', 'lastname']).size()
# there is also count() in case it is preferred

I manage to do the aggregation and the count. How can I lastly merge it to the original dataframe?

I would like to keep all rows and simply add it as an extra column.

`df['count'] = df.groupby(['firstname', 'lastname'])['firstname'].transform('size')` — Quang Hoang, Feb 16 '21 at 17:42
exactly what I needed! Can you explain the solution, i.e. further elaborate and I will accept it. I would like to understand it and learn — G. Macia, Feb 16 '21 at 17:43
There should be quite a few questions on SO on `groupby().transform()`. Also it should be straightforward from [the official document](https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.transform.html). — Quang Hoang, Feb 16 '21 at 17:45

How to count how many times a pandas row appears based only on certain columns?

0 Answers0