0

Assume there is a pandas Series containing values reflecting categories ('a','b','c' or 1,2,3):

pds = I.pd.Series(['a','a','b','c','c','c','a'])

I would like to generate a new Series, which indicates how often each element has already occured, i.e. the expected output would be:

pds_result = I.pd.Series([0,1,0,0,1,2,2])
                     #    ^ no 'a' prior to this position in pds
                     #      ^ one 'a' prior to this position in pds
                     #        ^ no 'b' prior to this position in pds
                     #                ^ two 'a' prior to this position in pds

How can this be achieved in a concise manner?

Arco Bast
  • 3,595
  • 2
  • 26
  • 53
  • `I am stuck with pandas 19.2` ? Are you sure? [link](https://pandas.pydata.org/pandas-docs/version/0.19.0/generated/pandas.core.groupby.GroupBy.cumcount.html) – jezrael Sep 15 '20 at 10:11
  • https://pandas.pydata.org/pandas-docs/version/0.15.0/generated/pandas.core.groupby.GroupBy.cumcount.html – jezrael Sep 15 '20 at 10:12
  • 1
    you are right, ofc. So I would do `pds.groupby(pds).cumcount()` right? ... a bit weird but works. – Arco Bast Sep 15 '20 at 10:17

0 Answers0