2

I want to assign a number to each group. I tried to do

df['group_n'] = df.groupby('ID').ngroup()

but it gives me an error msg:

SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

If i do, df['group_n'] = df.groupby('ID').ngroup().add(1)

I get _n in descending order (meaning C:3, B:2, A:1) is there a way to preserve that order but have group_n start from 0?

My current table:

ID   date   sender   
C    Jan20     3         
C    Feb20     7         
C    Mar20     12        
C    Apr20     15        
B    Mar20     1         
B    May20     10        
B    Jun20     15        
...
A    Jan21     10        
A    Feb21     12        
A    Mar21     20     
A    Apr21     5  

desired table:

ID   date   sender   group_n
C    Jan20     3         1
C    Feb20     7         1
C    Mar20     12        1
C    Apr20     15        1
B    Mar20     1         2
B    May20     10        2
B    Jun20     15        2
A    Jan21     10        3
A    Feb21     12        3
A    Mar21     20        3
A    Apr21     5         3

Thank you in advance!

Olive
  • 644
  • 4
  • 12

1 Answers1

3

Use:

df['group_n'] = pd.factorize(df['ID'])[0] + 1

Or:

df['group_n'] = df.groupby('ID', sort=False).ngroup().add(1)

print(df)

ID   date   sender   group_n
A    Jan20     3         1
A    Feb20     7         1
A    Mar20     12        1
A    Apr20     15        1
B    Mar20     1         2
B    May20     10        2
B    Jun20     15        2
C    Jan21     10        3
C    Feb21     12        3
C    Mar21     20        3
C    Apr21     5         3
E. Zeytinci
  • 2,642
  • 1
  • 20
  • 37
ansev
  • 30,322
  • 5
  • 17
  • 31
  • I have a problem where group_n is in descending order due to how my ID is sorted (see the OP) is there such thing as reverse _ngrouping? – Olive Jan 23 '22 at 23:00
  • 1
    yo can pass sort=False on groupby `groupby('ID', sort=False)` or use `pd.factorize()` – ansev Jan 23 '22 at 23:03
  • I did exactly as described: df['empMgrKey'] = df.groupby(['empId', 'mgrId']), sort=False).ngroup().add(1) but I still get the SettingWithCopyWarning message. Any idea how I can get rid of this warning, without using factorize()? Thanks! – Sikander Feb 21 '22 at 21:59
  • 1
    `df.loc[:, 'empMgrKey']`? is df a slice of other datafrme? @sikander, see https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas – ansev Feb 21 '22 at 22:57
  • @ansev I tried the df.loc[:, 'empMgrKey'] and got the same warning. I will go through the SO thread soon, but to answer your question about slicing, this is the line before ngroup(): df = df0.loc[df0['endDate'] >= pd.to_datetime('2020-01-01')] – Sikander Feb 22 '22 at 02:30