3

I have two data frames, like this:

df1 = pd.DataFrame()
df1['v1'] = [5,7,2,4,9,7,2]
df1['v2'] = ["a1", 'nan', "a2", "a3", "a5", "a6", "a9"]

   v1   v2
0   5   a1
1   7  nan
2   2   a2
3   4   a3
4   9   a5
5   7   a6
6   2   a9

and

dfa = pd.DataFrame()
dfa['pc1'] = np.random.rand(5)
dfa['pc2'] = np.random.rand(5)
dfa['idx'] = ["a1", "a2", "a3", "a6", "a9"]
df2 = dfa.set_index('idx')

          pc1       pc2
idx                    
a1   0.048725  0.050773
a2   0.289110  0.302272
a3   0.720966  0.663910
a6   0.021616  0.308114
a9   0.205923  0.583591

df1 has a column v2 that contains character values that match the index of df2. But it also has nan and may contains characters where no corresponding rownames in df2 exists.

I now want to merge these data frames to one, like this:

  v1    v2       pc1       pc2
0  5   a1   0.048725  0.050773
1  7  nan        nan       nan
2  2   a2   0.289110  0.302272
3  4   a3   0.720966  0.663910
4  9   a5        nan       nan
5  7   a6   0.021616  0.308114
6  2   a9   0.205923  0.583591

In R this is very easy using the rownames_to_column(df2, "v2") and left_join(df1, .) functions.

But how can I do it in pandas ?

Stefan Falk
  • 23,898
  • 50
  • 191
  • 378
spore234
  • 3,550
  • 6
  • 50
  • 76
  • btw, the `set_index` call somehow complicates stuff. This is not needed and you can pass the join parameter for every frame separately. So, you can always avoid joining by index, if you use e.g. `reset_index()` – Quickbeam2k1 Sep 05 '17 at 12:57

3 Answers3

4

you could do something like

pd.merge(df1, df2, left_on = 'v2', right_index=True, how = 'left')

this yields:

   v1   v2       pc1       pc2
0   5   a1  0.048725  0.050773
1   7  NaN       NaN       NaN
2   2   a2   0.28911  0.302272
3   4   a3  0.720966   0.66391
4   9   a5       NaN       NaN
5   7   a6  0.021616  0.308114
6   2   a9  0.205923  0.583591
Quickbeam2k1
  • 5,287
  • 2
  • 26
  • 42
2

UPDATE:

In [37]: df1.merge(df2, right_index=True, left_on='v2', how='outer')
Out[37]:
   v1   v2       pc1       pc2
0   5   a1  0.252062  0.602530
1   7  nan       NaN       NaN
2   2   a2  0.328666  0.988321
3   4   a3  0.704342  0.809817
4   9   a5       NaN       NaN
5   7   a6  0.001230  0.602590
6   2   a9  0.635444  0.926872

In [33]: df2.merge(df1, left_index=True, right_on='v2', how='outer')
Out[33]:
        pc1       pc2  v1   v2
0  0.252062  0.602530   5   a1
2  0.328666  0.988321   2   a2
3  0.704342  0.809817   4   a3
5  0.001230  0.602590   7   a6
6  0.635444  0.926872   2   a9
1       NaN       NaN   7  nan
4       NaN       NaN   9   a5
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
0
pd.merge(df2.reset_index(), df1, left_on='idx', right_on='v2', how='outer').drop('idx', axis=1)


Output:

            pc1            pc2      v1    v2
  0      0.760966       0.059443    5     a1
  1      0.059443       0.984703    2     a2
  2      0.214868       0.677140    4     a3
  3      0.224410       0.037784    7     a6
  4      0.297342       0.341810    2     a9
  5        NaN            NaN       7     nan
  6        NaN            NaN       9     a5
whateveros
  • 61
  • 4