0

I have 2 large data frames with thousands of columns for each df. I need to left-join the two tables, namely df1 and df2. However, I don't think I'll be able to manually list all of the common columns/keys between the 2 data frames. Below is an example of the data frames:

df1 = pd.DataFrame({'id' : ['a', 'b', 'c'], # Define dictionary
                'test' : [0,0,0],
                'beautiful' : [0,0,0],
                'crazy' : [0,0,0],
                'word' : [0,0,0]})

  id  test  beautiful  crazy  word
0  a     0          0      0     0
1  b     0          0      0     0
2  c     0          0      0     0

df2 = pd.DataFrame({'id' : ['a', 'b', 'c'], # Define dictionary
                'test' : [1,0,0],
                'autumn' : [0,1,0],
                'fall' : [0,0,1],
                'word' : [1,1,0]})

  id  test  autumn  fall  word
0  a     1       0     0     1
1  b     0       1     0     1
2  c     0       0     1     0

df_result = pd.DataFrame({'id':['a','b','c'],
                          'test' : [1,0,0],
                        'beautiful' : [0,0,0],
                        'crazy' : [0,0,0],
                        'word' : [1,1,0]})

  id  test  beautiful  crazy  word
0  a     1          0      0     1
1  b     0          0      0     1
2  c     0          0      0     0

As you can see from the code, I need to join the two data frames based on two conditions. If id matches and column names match, then transfer the value from df2 to df1. I found this post with a similar problem to mine, but it is left unsolved. Thanks in advance!

BeRT2me
  • 12,699
  • 2
  • 13
  • 31

3 Answers3

0

The DataFrame.update() method should do it.

df1.update(df2)

After running this line, we can test to see if it matches your desired output:

print(df1 == df_result)

Here's the result:

     id  test  beautiful  crazy  word
0  True  True       True   True  True
1  True  True       True   True  True
2  True  True       True   True  True

Additionally, df1 itself (.update works in place):

    id  test    beautiful   crazy   word
0   a   1   0   0   1
1   b   0   0   0   1
2   c   0   0   0   0
  • Your answer is right, because key and index are same in both dataframes here, but technically you missed the first condition. since the update should happen based on `id`, you need to set the index to `id` on both df's first. – Rabinzel Jul 29 '22 at 04:34
0

You just need to set the index to id and use pandas.DataFrame.update No need to worry about other columns in df2 because:

join{‘left’}: default ‘left’
Only left join is implemented, keeping the index and columns of the original object.

df1 = df1.set_index('id')
df1.update(df2.set_index('id'))
df1.reset_index(inplace=True)

Output df1:

  id  test  beautiful  crazy  word
0  a     1          0      0     1
1  b     0          0      0     1
2  c     0          0      0     0
Rabinzel
  • 7,757
  • 3
  • 10
  • 30
0

try this one

# Intersections

df1_columns = set(df1.columns)
df2_columns = set(df2.columns)
intersection = list(df1_columns.intersection(df2_columns))

#Merging based on similar columns

df_merge = df1.merge(df2, how = 'left', on = intersection)

I need to clarify, based on similar columns between df1 & df2, df1 stays unchanged and values transfer from df2 to df1 (based on intersected columns)