merge two data frames by index and column

Question

I have two data frames, like this:

df1 = pd.DataFrame()
df1['v1'] = [5,7,2,4,9,7,2]
df1['v2'] = ["a1", 'nan', "a2", "a3", "a5", "a6", "a9"]

   v1   v2
0   5   a1
1   7  nan
2   2   a2
3   4   a3
4   9   a5
5   7   a6
6   2   a9

and

dfa = pd.DataFrame()
dfa['pc1'] = np.random.rand(5)
dfa['pc2'] = np.random.rand(5)
dfa['idx'] = ["a1", "a2", "a3", "a6", "a9"]
df2 = dfa.set_index('idx')

          pc1       pc2
idx                    
a1   0.048725  0.050773
a2   0.289110  0.302272
a3   0.720966  0.663910
a6   0.021616  0.308114
a9   0.205923  0.583591

df1 has a column v2 that contains character values that match the index of df2. But it also has nan and may contains characters where no corresponding rownames in df2 exists.

I now want to merge these data frames to one, like this:

  v1    v2       pc1       pc2
0  5   a1   0.048725  0.050773
1  7  nan        nan       nan
2  2   a2   0.289110  0.302272
3  4   a3   0.720966  0.663910
4  9   a5        nan       nan
5  7   a6   0.021616  0.308114
6  2   a9   0.205923  0.583591

In R this is very easy using the rownames_to_column(df2, "v2") and left_join(df1, .) functions.

But how can I do it in pandas ?

btw, the `set_index` call somehow complicates stuff. This is not needed and you can pass the join parameter for every frame separately. So, you can always avoid joining by index, if you use e.g. `reset_index()` — Quickbeam2k1, Sep 05 '17 at 12:57

Quickbeam2k1 · Accepted Answer · 2017-09-05T12:54:28.513

4

you could do something like

pd.merge(df1, df2, left_on = 'v2', right_index=True, how = 'left')

this yields:

   v1   v2       pc1       pc2
0   5   a1  0.048725  0.050773
1   7  NaN       NaN       NaN
2   2   a2   0.28911  0.302272
3   4   a3  0.720966   0.66391
4   9   a5       NaN       NaN
5   7   a6  0.021616  0.308114
6   2   a9  0.205923  0.583591

edited Sep 05 '17 at 12:54

answered Sep 05 '17 at 12:43

Quickbeam2k1

5,287
2
26
42

MaxU - stand with Ukraine · Answer 2 · 2017-09-05T12:49:21.977

2

UPDATE:

In [37]: df1.merge(df2, right_index=True, left_on='v2', how='outer')
Out[37]:
   v1   v2       pc1       pc2
0   5   a1  0.252062  0.602530
1   7  nan       NaN       NaN
2   2   a2  0.328666  0.988321
3   4   a3  0.704342  0.809817
4   9   a5       NaN       NaN
5   7   a6  0.001230  0.602590
6   2   a9  0.635444  0.926872

In [33]: df2.merge(df1, left_index=True, right_on='v2', how='outer')
Out[33]:
        pc1       pc2  v1   v2
0  0.252062  0.602530   5   a1
2  0.328666  0.988321   2   a2
3  0.704342  0.809817   4   a3
5  0.001230  0.602590   7   a6
6  0.635444  0.926872   2   a9
1       NaN       NaN   7  nan
4       NaN       NaN   9   a5

edited Sep 05 '17 at 12:49

answered Sep 05 '17 at 12:43

MaxU - stand with Ukraine

205,989
36
386
419

thanks, why are the indices not ordered now, and how can I order the rows again by indices? – spore234 Sep 05 '17 at 12:47
@spore234, in this case you want to put `df1` first - see updated answer... – MaxU - stand with Ukraine Sep 05 '17 at 12:50

score 0 · Answer 3 · answered Sep 05 '17 at 12:59

pd.merge(df2.reset_index(), df1, left_on='idx', right_on='v2', how='outer').drop('idx', axis=1)


Output:

            pc1            pc2      v1    v2
  0      0.760966       0.059443    5     a1
  1      0.059443       0.984703    2     a2
  2      0.214868       0.677140    4     a3
  3      0.224410       0.037784    7     a6
  4      0.297342       0.341810    2     a9
  5        NaN            NaN       7     nan
  6        NaN            NaN       9     a5

merge two data frames by index and column

3 Answers3