0

Say I have a df as follows

df = pd.DataFrame({'val': [30, 40, 50, 60, 70, 80, 90], 'idx': [9, 8, 7, 6, 5, 4, 3],
                     'category': ['a', 'a', 'b', 'b', 'c', 'c', 'c']}).set_index('idx')

Ouput:

      val category
idx              
9     30        a
8     40        a
7     50        b
6     60        b
5     70        c
4     80        c
3     90        c

I want to add an incrementing number from 1 to the total number if rows for each 'category'. The new column should look like this:

     category  incrNbr  val
idx                       
3          a      1     30
4          a      2     40
5          b      1     50
6          b      2     60
7          c      1     70
8          c      2     80
9          c      3     90

Currently I loop through each category like this:


li = []
for index, row in df.iterrows():
    cat = row['category']
    if cat not in li:
        li.append(cat)
        temp = df.loc[(df['category'] == row['category'])][['val']]
        temp.insert(0, 'incrNbr', range(1, 1 + len(temp)))
        del temp['val']
        df = df.combine_first(temp)

It is very slow.

Is there a way to do this using vectorized operations?

user3607022
  • 480
  • 3
  • 16

1 Answers1

2

If your category column is sorted, we can use GroupBy.cumcount:

df['incrNbr'] = df.groupby('category')['category'].cumcount().add(1)

     val category  incrNbr
idx                       
9     30        a        1
8     40        a        2
7     50        b        1
6     60        b        2
5     70        c        1
4     80        c        2
3     90        c        3
Erfan
  • 40,971
  • 8
  • 66
  • 78