How to assign a group number to each ID (n=1,2,3.....)?

Question

I want to assign a number to each group. I tried to do

df['group_n'] = df.groupby('ID').ngroup()

but it gives me an error msg:

SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

If i do, df['group_n'] = df.groupby('ID').ngroup().add(1)

I get _n in descending order (meaning C:3, B:2, A:1) is there a way to preserve that order but have group_n start from 0?

My current table:

ID   date   sender   
C    Jan20     3         
C    Feb20     7         
C    Mar20     12        
C    Apr20     15        
B    Mar20     1         
B    May20     10        
B    Jun20     15        
...
A    Jan21     10        
A    Feb21     12        
A    Mar21     20     
A    Apr21     5

desired table:

ID   date   sender   group_n
C    Jan20     3         1
C    Feb20     7         1
C    Mar20     12        1
C    Apr20     15        1
B    Mar20     1         2
B    May20     10        2
B    Jun20     15        2
A    Jan21     10        3
A    Feb21     12        3
A    Mar21     20        3
A    Apr21     5         3

Thank you in advance!

Does group_n correspond to ID? – Richard K Yu Jan 23 '22 at 22:40 — Richard K Yu, Jan 23 '22 at 22:40

score 3 · Accepted Answer · edited Oct 20 '22 at 20:26

3

Use:

df['group_n'] = pd.factorize(df['ID'])[0] + 1

Or:

df['group_n'] = df.groupby('ID', sort=False).ngroup().add(1)

print(df)

ID   date   sender   group_n
A    Jan20     3         1
A    Feb20     7         1
A    Mar20     12        1
A    Apr20     15        1
B    Mar20     1         2
B    May20     10        2
B    Jun20     15        2
C    Jan21     10        3
C    Feb21     12        3
C    Mar21     20        3
C    Apr21     5         3

edited Oct 20 '22 at 20:26

E. Zeytinci

2,642
1
20
37

answered Jan 23 '22 at 22:42

ansev

30,322
5
17
31

I have a problem where group_n is in descending order due to how my ID is sorted (see the OP) is there such thing as reverse _ngrouping? – Olive Jan 23 '22 at 23:00
1

yo can pass sort=False on groupby `groupby('ID', sort=False)` or use `pd.factorize()` – ansev Jan 23 '22 at 23:03
I did exactly as described: df['empMgrKey'] = df.groupby(['empId', 'mgrId']), sort=False).ngroup().add(1) but I still get the SettingWithCopyWarning message. Any idea how I can get rid of this warning, without using factorize()? Thanks! – Sikander Feb 21 '22 at 21:59
1

`df.loc[:, 'empMgrKey']`? is df a slice of other datafrme? @sikander, see https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas – ansev Feb 21 '22 at 22:57
@ansev I tried the df.loc[:, 'empMgrKey'] and got the same warning. I will go through the SO thread soon, but to answer your question about slicing, this is the line before ngroup(): df = df0.loc[df0['endDate'] >= pd.to_datetime('2020-01-01')] – Sikander Feb 22 '22 at 02:30

How to assign a group number to each ID (n=1,2,3.....)?

1 Answers1