0

I would like to index my dataframe such that in each group it starts from 0 to the number of observations in the group. Ie from :

pd.DataFrame([["John","Car"],["John","House"],["Sam","Skate"],["Sam","Disco"],["Sam","Space"]])

I would like to have :

pd.DataFrame([["John","Car",0],["John","House",1],["Sam","Skate",0],["Sam","Disco",1],["Sam","Space",2]])

Thanks

Arli94
  • 680
  • 2
  • 8
  • 19

2 Answers2

3

Youre looking for the cumulative count function:

df = pd.DataFrame([["John","Car"],["John","House"],["Sam","Skate"],["Sam","Disco"],["Sam","Space"]])
df.groupby(0).cumcount()
Zulfiqaar
  • 603
  • 1
  • 6
  • 12
2

Use:

df.groupby(0)[0].apply(lambda x:x.duplicated().cumsum())
anky
  • 74,114
  • 11
  • 41
  • 70