5

I am trying to merge two dataframes in Python based on column 'X'.

The column X in left dataframe has non-unique values and the column X in right dataframe has unique values. How can I merge the values from the right dataframe into the left dataframe?

I want to merge rows from df2 into df1 to form df3

df1 = pd.DataFrame({'A': ['NA','EU','LA','ME'],
                    'B': [50, 23,21,100],
                    'X': ['IW233', 'IW455', 'IW455', 'IW100']})

df2 = pd.DataFrame({'C': [50, 12, 12, 11, 10, 16],
                    'X': ['IW455', 'IW200', 'IW233', 'IW150', 'IW175', 'IW100'],
                    'D': ['Aug', 'Sep', 'Jan', 'Feb', 'Dec', 'Nov']})

df3:1

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
tk215
  • 53
  • 1
  • 1
  • 4

2 Answers2

4

You can use merge with left join, if only X is joined column on parameter can be omit:

df = pd.merge(df1, df2, how='left')
print (df)
    A    B      X   C    D
0  NA   50  IW233  12  Jan
1  EU   23  IW455  50  Aug
2  LA   21  IW455  50  Aug
3  ME  100  IW100  16  Nov

If multiple same columns names:

df = pd.merge(df1, df2, on='X', how='left')
print (df)
    A    B      X   C    D
0  NA   50  IW233  12  Jan
1  EU   23  IW455  50  Aug
2  LA   21  IW455  50  Aug
3  ME  100  IW100  16  Nov
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thanks! I am able to merge but all values from the right dataframe have now become null. Any suggestions as to why this might happen? – tk215 Aug 06 '17 at 19:46
  • There can be problem different types, so need cast column `X` to same dtypes. Check [this](https://stackoverflow.com/a/42947365/2901002) – jezrael Aug 06 '17 at 19:49
  • I changed both dtypes to string however the error is still there. All right values are still NaN – tk215 Aug 06 '17 at 20:04
  • 2
    Maybe some whitespaces, then need `df1['X'] = df1['X'].astype(str).str.strip()`. Same for `df2`. – jezrael Aug 06 '17 at 20:06
1

You can use a join operator here:

>>> df1.join(df2.set_index('X'),on='X')
    A    B      X   C    D
0  NA   50  IW233  12  Jan
1  EU   23  IW455  50  Aug
2  LA   21  IW455  50  Aug
3  ME  100  IW100  16  Nov

So we first change the index of the right frame to X (since these are unique on the right frame, that is not a problem). Then we perform a join on the X column.

Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555