I have a pandas DF:
import pandas as pd
df = pd.DataFrame([["apple",1],["apple",20],["apple",21],["mango",31],["mango",17]])
df.columns = ["fruit", "count"]
df
OP:
fruit count
0 apple 1
1 apple 20
2 apple 21
3 mango 31
4 mango 17
I am trying to create a new column which creates unique_row_id, for each row
within each group
. For example for the group apple
the unique
column should have entries 0,1,2
since there are 3 rows and for group mango
it should be 0,1
as there are 2 rows
df["unique_row_number_per_group"] =df.reset_index().groupby("fruit")["index"].transform(lambda x: pd.factorize(x)[0])
OP:
f ruit count unique_rows_per_group
0 apple 1 0
1 apple 20 1
2 apple 20 2
3 mango 31 0
4 mango 17 1
This works but takes horribly long for big DFs, any suggestion on doing this a more pandas efficient way will be helpful