1

I have a dataframe:

import pandas as pd
df = pd.DataFrame({'item': ['A', 'A', 'B', 'B', 'B', 'B', 'C']})
df
item
0 A
1 A
2 B
3 B
4 B
5 B
6 C

I want to add entrynum column which shows rownumber for each item starting from 1, so for my dataframe to look like this:

item entrynum
0 A 1
1 A 2
2 B 1
3 B 2
4 B 3
5 B 4
6 C 1

I can iterate Pandas dataframe row by row and if item value changed set entrynum value equal to 1 otherwise set entrynum equal to previous entrynum value plus 1. But I wonder if a better solution exists?

I'm thinking what I want is similar to SQL window functions but I don't understand how to provide partition by item.

OmG
  • 18,337
  • 10
  • 57
  • 90
rfg
  • 1,331
  • 1
  • 8
  • 24
  • 1
    You may want to check this thread https://stackoverflow.com/questions/17775935/sql-like-window-functions-in-pandas-row-numbering-in-python-pandas-dataframe – kasper Dec 30 '20 at 20:15

1 Answers1

1

Groupby item and cumulatively count entities in each group. Store the output in the entrance column.

 df['entrynum']=df.groupby('item').cumcount()+1
wwnde
  • 26,119
  • 6
  • 18
  • 32
  • 1
    @ppwater, thanks I should have as a matter of course given an explanation. Apologies. – wwnde Dec 31 '20 at 04:23