0

I am trying to get the index (or running count if you will) of each individual record in a groupby object into a column. I doesn't have to be a groupby, but the order has to remain the same, so for example, I want to sort and reindex by column C:

df = pd.DataFrame([[1, 2, 'Foo'],
                   [1, 3, 'Foo'],
                   [4, 6,'Bar'],
                   [7,8,'Bar']],
                  columns=['A', 'B', 'C'])

Out[72]: 
   A  B    C
0  1  2  Foo
1  1  3  Foo
2  4  6  Bar
3  7  8  Bar

My desired output would be:

Out[75]: 
   A  B    C  sorted
0  1  2  Foo       1
1  1  3  Foo       2
2  4  6  Bar       1
3  7  8  Bar       2

It seems like this should be really easy, but nothing I've tried really comes close without looping through the entire data frame, which I would prefer to avoid. Thanks

krflol
  • 1,105
  • 7
  • 12

1 Answers1

1

Try with cumcount:

>>> df = pd.DataFrame([[1, 2, 'Foo'],
...                    [1, 3, 'Foo'],
...                    [4, 6,'Bar'],
...                    [7,8,'Bar']],
...                   columns=['A', 'B', 'C'])
>>> df["sorted"]=df.groupby("C").cumcount()+1
>>> df
   A  B    C  sorted
0  1  2  Foo       1
1  1  3  Foo       2
2  4  6  Bar       1
3  7  8  Bar       2
Grzegorz Skibinski
  • 12,624
  • 2
  • 11
  • 34