Say I have a df as follows
df = pd.DataFrame({'val': [30, 40, 50, 60, 70, 80, 90], 'idx': [9, 8, 7, 6, 5, 4, 3],
'category': ['a', 'a', 'b', 'b', 'c', 'c', 'c']}).set_index('idx')
Ouput:
val category
idx
9 30 a
8 40 a
7 50 b
6 60 b
5 70 c
4 80 c
3 90 c
I want to add an incrementing number from 1 to the total number if rows for each 'category'. The new column should look like this:
category incrNbr val
idx
3 a 1 30
4 a 2 40
5 b 1 50
6 b 2 60
7 c 1 70
8 c 2 80
9 c 3 90
Currently I loop through each category like this:
li = []
for index, row in df.iterrows():
cat = row['category']
if cat not in li:
li.append(cat)
temp = df.loc[(df['category'] == row['category'])][['val']]
temp.insert(0, 'incrNbr', range(1, 1 + len(temp)))
del temp['val']
df = df.combine_first(temp)
It is very slow.
Is there a way to do this using vectorized operations?