Pandas - groupby all columns and mark in original dataframe

Question

I have a DataFrame with columns 'Id' which is unique, and 'A', 'B', 'C', etc...

There are different rows where all values 'A', 'B', 'C' are the same. I'd like to give them a group name (a running index from 1).

For example:

df = pd.DataFrame({"A": [1, 1, 1, 2], "B": [3, 4, 4, 4], "C": [5, 5, 5, 5]})
df
Out[127]: 
   A  B  C
0  1  3  5
1  1  4  5
2  1  4  5
3  2  4  5

Will become

   A  B  C  grp
0  1  3  5    1
1  1  4  5    2
2  1  4  5    2
3  2  4  5    3

I know I can groupby ['A', 'B', 'C'] and get the keys, but than, I have to iterate over the keys and Dataframe in an un-optimized matter. I'm failing to do it in an optimized way

jezrael · Accepted Answer · 2018-08-27T07:32:33.683

3

Use GroupBy.ngroup:

df['grp'] = df.groupby(['A', 'B', 'C']).ngroup() + 1
print (df)

   A  B  C  grp
0  1  3  5    1
1  1  4  5    2
2  1  4  5    2
3  2  4  5    3

If columns are sorted:

df['grp'] = pd.factorize([tuple(x) for x in df.values])[0] + 1

edited Aug 27 '18 at 07:32

answered Aug 27 '18 at 07:29

jezrael

822,522
95
1,334
1,252

1

ngroup() ... damnit I used ngroup without brackets and got stuck on that. Thanks jezrael! – Eran Moshe Aug 27 '18 at 07:31
Can you please be more careful before answering duplicates? This is the second time I've caught you tonight. – cs95 Aug 27 '18 at 07:49

Pandas - groupby all columns and mark in original dataframe

1 Answers1