Numbering Groups In Pandas DataFrame

Question

Is there a way in Pandas to number groups in a DataFrame, based on column values? If my frame looks like this

  Column1 Column2  Column3
0       A       X       23
1       A       X       45
2       A       Y       32
3       A       Y       53
4       A       Y       67
5       B       X       85
6       B       Y       12
7       B       Y       94

What I'd like to be able to do is something like

df.group_numbers(['Column1', 'Column2'])

  Column1 Column2  Column3  GroupNumber
0       A       X       23            1
1       A       X       45            1
2       A       Y       32            2
3       A       Y       53            2
4       A       Y       67            2
5       B       X       85            3    
6       B       Y       12            4
7       B       Y       94            4

This is a bit like a multi-column factorize: http://stackoverflow.com/questions/16453465/multi-column-factorize-in-pandas — Alex Riley, Oct 30 '15 at 19:59

JoeCondron · Accepted Answer · 2015-10-30T21:55:54.210

1

As suggested in ajcr's comment, pd.factorize is the way to go. In your case you can add the two columns to quickly create an array of keys by adding the two columns with some delimiter between. The delimiter is to avoid confusing pairs such as ab, c and a, bc as suggested by DSM.

df['GroupNumber'] = pd.factorize(df.Column1 + ' ' + df.Column2)

It's still faster than using pd.lib.fast_zip.

edited Oct 30 '15 at 21:55

answered Oct 30 '15 at 20:06

JoeCondron

8,546
3
27
28

I think I like your earlier answer better. This one would confuse "AB","C" and "A","BC". – DSM Oct 30 '15 at 20:27
Yes, good point. I was focusing too much on the data presented. I've updated the answer. – JoeCondron Oct 30 '15 at 21:56

Numbering Groups In Pandas DataFrame

1 Answers1

Linked