1

I have two dataframes:

df1
     A   B     id col1 col2 col3 
0   E1  E2     0   NaN  NaN  NaN
1   E1  E3     1   NaN  NaN  NaN
2   E1  E4     2   NaN  NaN  NaN 
3   E2  E1     3   NaN  NaN  NaN
4   E1  E4     4   NaN  NaN  NaN
5   E2  E1     5   NaN  NaN  NaN

df2
     A   B     id col1 col2 col3 
0   E1  E2     3     1    0    1
1   E1  E3     5     0    1    1

I want to update the values in col1, col2, col3 in df1 by taking those values in df2 by matching on id to get:

df3
     A   B     id col1 col2 col3 
0   E1  E2     0   NaN  NaN  NaN
1   E1  E3     1   NaN  NaN  NaN
2   E1  E4     2   NaN  NaN  NaN 
3   E2  E1     3     1    0    1
4   E1  E4     4   NaN  NaN  NaN
5   E2  E1     5     0    1    1

As my actual dataframe is much larger, I want to use the list of the column names that I would like to update:

add = ['col1', 'col2', 'col3']

How can I use this column names to get the desired result?

I referred to this question and this question which directed me to use .loc but I can't figure out how to incorporate a reference to the index and the list of multiple columns a la:

df1.loc[df1['edge_id'] == df2['edge_id'], add] = df2[add]

Obviously this didn't work...

Chuck
  • 3,664
  • 7
  • 42
  • 76

3 Answers3

2

Here's one solution with combine_first -

i = df1.set_index('id')
j = df2.drop(['A', 'B'], 1).set_index('id') # df2.set_index('id')[add]

df = i.combine_first(j).reset_index()
df

   id   A   B  col1  col2  col3
0   0  E1  E2   NaN   NaN   NaN
1   1  E1  E3   NaN   NaN   NaN
2   2  E1  E4   NaN   NaN   NaN
3   3  E2  E1   1.0   0.0   1.0
4   4  E1  E4   NaN   NaN   NaN
5   5  E2  E1   0.0   1.0   1.0

If the order of columns matters, reindex -

df = i.combine_first(j).reset_index().reindex(columns=df1.columns)
df

    A   B  id  col1  col2  col3
0  E1  E2   0   NaN   NaN   NaN
1  E1  E3   1   NaN   NaN   NaN
2  E1  E4   2   NaN   NaN   NaN
3  E2  E1   3   1.0   0.0   1.0
4  E1  E4   4   NaN   NaN   NaN
5  E2  E1   5   0.0   1.0   1.0
cs95
  • 379,657
  • 97
  • 704
  • 746
  • Let me know about views on concat option – Bharath M Shetty Dec 15 '17 at 12:21
  • Thanks for solution Coldspeed. I ended using Jez's, only because I could not guarantee knowing in advance which columns to drop. Though I think his is like your second one too. They all work well. Really appreciate your answering :) – Chuck Dec 15 '17 at 14:57
2

Use combine_first:

df1[add] = df1.set_index('id')[add].combine_first(df2.set_index('id')[add]).values
print (df1)
    A   B  id  col1  col2  col3
0  E1  E2   0   NaN   NaN   NaN
1  E1  E3   1   NaN   NaN   NaN
2  E1  E4   2   NaN   NaN   NaN
3  E2  E1   3   1.0   0.0   1.0
4  E1  E4   4   NaN   NaN   NaN
5  E2  E1   5   0.0   1.0   1.0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
2

We can use concat and drop duplicates i.e

df3 = pd.concat([df,df2]).drop_duplicates(['id'],keep='last').reset_index(drop=True)

 A   B  id  col1  col2  col3
0  E1  E2   0   NaN   NaN   NaN
1  E1  E3   1   NaN   NaN   NaN
2  E1  E4   2   NaN   NaN   NaN
3  E1  E4   4   NaN   NaN   NaN
4  E1  E2   3   1.0   0.0   1.0
5  E1  E3   5   0.0   1.0   1.0
Bharath M Shetty
  • 30,075
  • 6
  • 57
  • 108