0

I have a pandas DF:

import pandas as pd
df = pd.DataFrame([["apple",1],["apple",20],["apple",21],["mango",31],["mango",17]])
df.columns = ["fruit", "count"]
df

OP:

    fruit   count
0   apple   1
1   apple   20
2   apple   21
3   mango   31
4   mango   17

I am trying to create a new column which creates unique_row_id, for each row within each group. For example for the group apple the unique column should have entries 0,1,2 since there are 3 rows and for group mango it should be 0,1 as there are 2 rows

df["unique_row_number_per_group"] =df.reset_index().groupby("fruit")["index"].transform(lambda x: pd.factorize(x)[0])

OP:

f   ruit    count   unique_rows_per_group
0   apple   1        0
1   apple   20       1
2   apple   20       2
3   mango   31       0
4   mango   17       1

This works but takes horribly long for big DFs, any suggestion on doing this a more pandas efficient way will be helpful

data_person
  • 4,194
  • 7
  • 40
  • 75
  • 1
    [groupby cumcount](https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.GroupBy.cumcount.html) -> `df["unique_row_number_per_group"] = df.groupby("fruit").cumcount()` – Henry Ecker Aug 13 '21 at 00:03

1 Answers1

4

If you need count per group by row we have cumcount:

df['new'] = df.groupby('fruit').cumcount()
df
Out[346]: 
   fruit  count  new
0  apple      1    0
1  apple     20    1
2  apple     21    2
3  mango     31    0
4  mango     17    1

Or:

df['new'] = df.assign(new=1).groupby('fruit')['new'].cumsum()-1
df
Out[352]: 
   fruit  count  new
0  apple      1    0
1  apple     20    1
2  apple     21    2
3  mango     31    0
4  mango     17    1
Jonta
  • 393
  • 5
  • 25
BENY
  • 317,841
  • 20
  • 164
  • 234