Join or Merge dataframes with different multi-indexes

Question

Apologies in advance for duplicate question, but I have yet to find a solution that works from previous questions.

I am unable to join two data-frames with different MultiIndexes. I want to keep all the columns from both data-frames.

Given that df1 has ~300k rows and df2 has ~50k rows the join would be many:1 between df1:df2.

df1                  B  path_id
cust_id date                   
11      2015-02-24  10       13
28      2015-02-25  16       22
23      2015-02-26  21       19
15      2015-02-27  11       28
18      2015-02-28  29       10


df2               C
cust_id path_id    
11      13       10
28      22       26
23      19       22
15      28       27
18      10       18

The goal is to assign column C to all matching combinations of index cust_id & column path_id. See df3 below as an example.

df3                     B   C  path_id
cust_id date                       
11      2015-02-24  10  10       13
28      2015-02-25  16  26       22
23      2015-02-26  21  22       19
15      2015-02-27  11  27       28
18      2015-02-28  29  18       10

Appreciate any response on this. Thank you!

It would help greatly if you created some dummy dataframe with your structure and then also, showed the expected output. https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples — Scott Boston, Jul 10 '20 at 17:21
Since it appears that your index levels have names, you can use more on "cust_id". try `df1.merge(df2, on=['cust_id','path_id'])` — Scott Boston, Jul 10 '20 at 17:58

score 1 · Answer 1 · answered Jul 10 '20 at 18:01

1

Well I figured it out. I'm not sure if this is the best way but I just reset the indexes of both data frames and merged on the columns. See code below.

df1.reset_index()
df2.reset_index()
df3 = df1.merge(df2, on=['cust_id', 'path_id'])

I then reassigned the indexes afterwards. If there is a better way please let me know.

Thanks!

answered Jul 10 '20 at 18:01

W.Powell

23
6

Yep. This is one way to do it. I posted a solution below where you didn't have to do both index levels on both dataframes. – Scott Boston Jul 10 '20 at 18:05

score 0 · Answer 2 · answered Jul 10 '20 at 18:02

0

Try:

df1.reset_index('date').merge(df2, on=['cust_id','path_id'])

Output:

               date   b  path_id   c
cust_id                             
11       2015-02-24  10       13  10
28       2015-02-25  16       22  26
23       2015-02-26  21       19  22
15       2015-02-27  11       28  27
18       2015-02-28  29       10  18

answered Jul 10 '20 at 18:02

Scott Boston

147,308
15
139
187

Thanks. I believe my company is using an older version of pandas and I can't join on indexes. When I try to put an index label into the join array it returns that the column doesn't exist. – W.Powell Jul 10 '20 at 18:07
@W.Powell I do remember that being an issue with an older version of pandas. What version of pandas are you running out of curiosity? – Scott Boston Jul 10 '20 at 18:57
pd.__version__ = 0.25.1 – W.Powell Jul 10 '20 at 19:27

score 0 · Answer 3 · answered Jul 10 '20 at 19:01

Based on what I could understand by your question, these are two options that could help you.

The following code would perform a full (inner) merge based on two indexes i.e: cust_id and path_id and will store the rest data as columns of the dataframe.

df3 = pd.merge(df1, df2,on=["cust_id", "path_id"])

The following code would perform the above as well as will make the column (cust_id) the index for the new dataframe.

df3 = pd.merge(df1, df2,on=["cust_id", "path_id"]).set_index('cust_id')

Join or Merge dataframes with different multi-indexes

3 Answers3