Replace multiple column values in one dataframe by values of another dataframe subjected to a common key column

Question

I want to update the values in a GeoPanda dataframe from another GeoPanda dataframe for select columns. Both of them will have a common key called 'geometry.'

For example

df1 = pd.DataFrame([["X",1,1,0],
              ["Y",0,1,0],
              ["Z",0,0,0],
              ["Y",0,0,0]],columns=["geometry","Nonprofit","Business", "Education"])    

df2 = pd.DataFrame([["Y",1,1],
              ["Z",1,1]],columns=["geometry","Non", "Edu"])

Following this answer I did the following steps:

df1 = df1.set_index('geometry')
df2 = df2.set_index('geometry')

list_1 = ['Nonprofit', 'Education']
list_2 = ['Non', 'Edu']

df1[list_1].update(df2[list_2])

This results in the wrong results without any warning. How can I fix this?

Notes:

Updating one column at a time (df1['Nonprofit'].update(df2['Non'])) will produce the correct result.

geometry Linestring from GeoPandas replaced by a character for simplicity.

Then it is because you does not use current version of pandas. The answer in the link says warning occurs when using current version of pandas. — Gilseung Ahn, Apr 22 '20 at 22:31
I think the issue is related to multiple columns labels passed in a list. When I used df1['Nonprofit'].update(df2['Non']) I got the correct answer. I am have issues when I pass the list for column names in df1[list_1].update(df2[list_2]). Thanks — PPR, Apr 22 '20 at 23:02

amain · Accepted Answer · 2020-04-23T01:59:52.100

DataFrame.update only updates columns with the same name!

Accordingly, one solution would be to first rename the columns in df2 to match those in df1.

Note that when calling update(), there is no need to specify the target columns in df1: all common columns will be updated. If required, you can specify which columns you want from df2 by using column indexing.

df2 = df2.rename(columns={'Non': 'Nonprofit', 'Edu': 'Education'})
df1.update(df2)  

# optionally restrict columns:
# df1.update(df2['Nonprofit'])  

# alternative short version, leaving df2 untouched
df1.update(df2.rename(columns={'Non': 'Nonprofit', 'Edu': 'Education'}))

gives

          Nonprofit  Business  Education
geometry                                
X               1.0         1        0.0
Y               1.0         1        1.0
Z               1.0         0        1.0
Y               1.0         0        1.0

The reason your "single column" approach works is that there you're implicitly using Series.update, where there is no such concept as common columns.

Replace multiple column values in one dataframe by values of another dataframe subjected to a common key column

1 Answers1