0

I want to join two data frame df1 and df2 on two columns. For example, in the following dataframes, I want to join them with column a, b and a1, b1 and build the third dataframe.

import pandas as pd
import numpy as np
df1 = pd.DataFrame()
df2 = pd.DataFrame()
df3 = pd.DataFrame()
df1['a'] = [ 1,  2, 3 ]
df1['b'] = [ 2, 4, 6]
df1['c'] = [ 3, 5, 9]

df2['a1'] = [ 1,  2 ]
df2['b1'] = [ 4, 4]
df2['c1'] = [ 7, 5]

The output:

enter image description here

1 Answers1

1

You can use pd.merge() and multiple keys a, b and a1, b1 using left_on and right_on, as follows:

import pandas as pd
import numpy as np

df1 = pd.DataFrame()
df2 = pd.DataFrame()
df3 = pd.DataFrame()

df1['a'] = [1, 2, 3]
df1['b'] = [2, 4, 6]
df1['c'] = [3, 5, 9]

df2['a1'] = [1, 2]
df2['b1'] = [4, 4]
df2['c1'] = [7, 5]

df3 = pd.merge(df1, df2, left_on=['a', 'b'], right_on=['a1', 'b1'], how='inner')
print(df3) # df3 has all columns for df1 and df2

#   a  b  c  a1  b1  c1
#0  2  4  5   2   4   5

df3 = df3.drop(df2.columns, axis=1) # removed columns of df2 as they're duplicated
df3.columns = ['a2', 'b2', 'c3'] # column names are changed as you want.
print(df3)

#   a2  b2  c3
#0   2   4   5

For more information about pd.merge(), please see: https://pandas.pydata.org/docs/reference/api/pandas.merge.html

Park
  • 2,446
  • 1
  • 16
  • 25