0

Say I have a Pandas DataFrame in Python

df = pd.DataFrame(
    {
        "A": [ 1 , 2 , 2 , 1 , 3 , 1 ],
        "B": ["a","b","c","d","e","f"],
    }
)

I'd like to get a series, Count, in which the m-th element corresponds to how many times the m-th element of df["A"] has appeared in df["A"][0:m]. Or, equivalently, a series with the ammount of times that a given term has appeared before. So, in our example, the desired result would be

0    0
1    0
2    1
3    1
4    0
5    2
Name: Count, dtype: float64  

One way to achieve this is to use while to create a series by adding elements one by one using (df.A.[0:i] == df.iloc[i]["A"]).sum() for an index i going from 0 to the length of df. The thing is that I don't know if this is a feature that already comes with Pandas DataFrames. I know about the existence of df['Count'] = df.groupby('A')['A'].transform('count')["Count"] which outputs

0    3
1    2
2    2
3    3
4    1
5    3
Name: Count, dtype: float64

that is, the total ammount of times an element appears in the whole series: a similar result to what I want.

So, my question is: Are there ways of arriving at what I want to achieve that are simpler than the while method mentioned and that, perhaps, resemble the latter method for counting the tottal number of appearences?

  • 3
    You want a [`groupby.cumcount`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.groupby.GroupBy.cumcount.html): `df.groupby('A').cumcount()` – mozway Feb 06 '23 at 13:27

0 Answers0