How to merge two pandas DataFrames by CategoricalIndex object

Question

Suppose a dataframe x:

x = pd.DataFrame({'A':[None,None,3,4,5,6,7,8,9,10],'B':[2,4,3,5,1,6,9,0,4,4]},index = range(10,20))
bins = [0,3,6,15]
x['A_level'] = pd.cut(x['A'],bins)
print x

which looks like:

     A   B  A_level
10   NaN  2   NaN
11   NaN  4   NaN
12   3  3   (0, 3]
13   4  5   (3, 6]
14   5  1   (3, 6]
15   6  6   (3, 6]
16   7  9  (6, 15]
17   8  0  (6, 15]
18   9  4  (6, 15]
19  10  4  (6, 15]

Then the summary series y reads:

y = x[['A_level','B']].groupby('A_level').mean()
y.columns = ['B_mean']
print y

which looks like:

            B
A_level      
(0, 3]   3.00
(3, 6]   4.00
(6, 15]  4.25

My question is how to merge x and y and get the following result?

    A  B  A_level  B_mean
10   NaN  2   NaN   NaN
11   NaN  4   NaN   NaN
12   3  3   (0, 3]   3.00
13   4  5   (3, 6]   4.00
14   5  1   (3, 6]   4.00
15   6  6   (3, 6]   4.00
16   7  9  (6, 15]   4.25
17   8  0  (6, 15]   4.25
18   9  4  (6, 15]   4.25
19  10  4  (6, 15]   4.25

I've tried

x['B_mean'] = y[x['A_level']]

but it returns KeyError.

The following code could do a similar job,

z = pd.merge(x,y.reset_index(),how='left',on='A_level')

which returns:

    A  B  A_level  B_mean
0 NaN  2      NaN     NaN
1 NaN  4      NaN     NaN
2   3  3   (0, 3]    3.00
3   4  5   (3, 6]    4.00
4   5  1   (3, 6]    4.00
5   6  6   (3, 6]    4.00
6   7  9  (6, 15]    4.25
7   8  0  (6, 15]    4.25
8   9  4  (6, 15]    4.25
9  10  4  (6, 15]    4.25

but the index of x and z is different. I know it can be done by

z.index = x.index

but I'm curious about whether there is a better way to do it.

Thanks a lot in advance!

Possible duplicate of [How to keep index when using pandas merge](http://stackoverflow.com/questions/11976503/how-to-keep-index-when-using-pandas-merge) — Anton Protopopov, Nov 11 '15 at 05:50

score 0 · Accepted Answer · edited May 23 '17 at 12:15

0

You could do like in that answer, but it almost the same as your solution:

z = pd.merge(x,y.reset_index(),how='left',on='A_level').set_index(x.index)

edited May 23 '17 at 12:15

Community

1
1

answered Nov 11 '15 at 05:52

Anton Protopopov

30,354
12
88
93

Thanks Anton. But I'm still curious about if there is a solution like my first try `x['B_mean'] = y[x['A_level']]`. – Chengxi Yang Nov 11 '15 at 07:30
`x['A_level']` has more indicesthat `y` could contain so I think that's why you couldn't access `y` dataframe with that indices: `In [40]: y[x['A_level']]` `KeyError: "[nan nan '(0, 3]' '(3, 6]' '(3, 6]' '(3, 6]' '(6, 15]' '(6, 15]' '(6, 15]'\n '(6, 15]'] not in index"` – Anton Protopopov Nov 11 '15 at 07:36

How to merge two pandas DataFrames by CategoricalIndex object

1 Answers1