2-dimensional binning with Pandas

Question

So I have two sets of features that I wish to bin (classify) and then combine to create a new feature. It is not unlike classifying coordinates into grids on a map.

The issue is that the features are not evenly distributed and I would like to use quantiles when binning (like with pandas.qcut()) on both features/coordinates.

Is there a better way than doing qcut() on both features and then concatenating the result labels?

I can't think of any. It seems the absolute appropriate way to do it. Only thing better would be to have a built in function for it. — piRSquared, Apr 15 '17 at 06:30
Putting it together now... don't get too excited, it's merely an extension of the same idea — piRSquared, Apr 15 '17 at 06:32

piRSquared · Accepted Answer · 2017-04-15T06:44:37.967

Create a cartesian product categorical.

Consider the dataframe df

df = pd.DataFrame(dict(A=np.random.rand(20), B=np.random.rand(20)))

           A         B
0   0.538186  0.038985
1   0.185523  0.438329
2   0.652151  0.067359
3   0.746060  0.774688
4   0.373741  0.009526
5   0.603536  0.149733
6   0.775801  0.585309
7   0.091238  0.811828
8   0.504035  0.639003
9   0.671320  0.132974
10  0.619939  0.883372
11  0.301644  0.882258
12  0.956463  0.391942
13  0.702457  0.099619
14  0.367810  0.071612
15  0.454935  0.651631
16  0.882029  0.015642
17  0.880251  0.348386
18  0.496250  0.606346
19  0.805688  0.401578

We can create new categoricals with pd.qcut

d1 = df.assign(
    A_cut=pd.qcut(df.A, 2, labels=[1, 2]),
    B_cut=pd.qcut(df.B, 2, labels=list('ab'))
)

           A         B A_cut B_cut
0   0.538186  0.038985     1     a
1   0.185523  0.438329     1     b
2   0.652151  0.067359     2     a
3   0.746060  0.774688     2     b
4   0.373741  0.009526     1     a
5   0.603536  0.149733     1     a
6   0.775801  0.585309     2     b
7   0.091238  0.811828     1     b
8   0.504035  0.639003     1     b
9   0.671320  0.132974     2     a
10  0.619939  0.883372     2     b
11  0.301644  0.882258     1     b
12  0.956463  0.391942     2     a
13  0.702457  0.099619     2     a
14  0.367810  0.071612     1     a
15  0.454935  0.651631     1     b
16  0.882029  0.015642     2     a
17  0.880251  0.348386     2     a
18  0.496250  0.606346     1     b
19  0.805688  0.401578     2     b

You can create the cartesian product categorical with tuples

d2 = d1.assign(cartesian=pd.Categorical(d1.filter(regex='_cut').apply(tuple, 1)))
print(d2)

           A         B A_cut B_cut cartesian
0   0.538186  0.038985     1     a    (1, a)
1   0.185523  0.438329     1     b    (1, b)
2   0.652151  0.067359     2     a    (2, a)
3   0.746060  0.774688     2     b    (2, b)
4   0.373741  0.009526     1     a    (1, a)
5   0.603536  0.149733     1     a    (1, a)
6   0.775801  0.585309     2     b    (2, b)
7   0.091238  0.811828     1     b    (1, b)
8   0.504035  0.639003     1     b    (1, b)
9   0.671320  0.132974     2     a    (2, a)
10  0.619939  0.883372     2     b    (2, b)
11  0.301644  0.882258     1     b    (1, b)
12  0.956463  0.391942     2     a    (2, a)
13  0.702457  0.099619     2     a    (2, a)
14  0.367810  0.071612     1     a    (1, a)
15  0.454935  0.651631     1     b    (1, b)
16  0.882029  0.015642     2     a    (2, a)
17  0.880251  0.348386     2     a    (2, a)
18  0.496250  0.606346     1     b    (1, b)
19  0.805688  0.401578     2     b    (2, b)

If you were so inclined, you could even declare an ordering for them.

Looks great. Thanks! Will accept this answer within a reasonable timeframe if no other challengers appear. — Reuben L., Apr 15 '17 at 06:42
Thanks! I'm using now: `df["bucket"] = list(zip(pd.qcut(df["A"], 10, labels=list(range(10))), pd.qcut(df["B"], 10, labels=list(range(10)))))` based on your answer. — AxelWass, Mar 25 '22 at 14:55

2-dimensional binning with Pandas

1 Answers1

Linked