pandas two dataframes multiple column values comparison

Question

I have a few very large datasets with x, y and z values. These datasets represent interpolated height measurements in time. The first dataset (the original) contains the data for the entire area. In time parts of the area have been measured again. I want to overwrite the original dataset in the location where x and y are equal but z is different (the height has changed at location(x,y)).

So my dataframes look something like this

Original:

x    y    z
1    1    0.5
1    2    0.5
1    3    0.5
2    1    0.5
2    2    0.5
2    3    0.5
3    1    0.5
3    2    0.5
3    3    0.5

New measurement:

x    y    z
0    1    0.5
0    2    0.5
1    1    1.5
1    2    0.5
2    1    0.5
2    2    1.0

The final dataframe should look like this:

x    y    z
1    1    1.5
1    2    0.5
1    3    0.5
2    1    0.5
2    2    1.0
2    3    0.5
3    1    0.5
3    2    0.5
3    3    0.5

I can loop through all the measurements and see of the x and y occur in the original and if the z value is different (if so, replace it) but this takes forever and I can imagine that there must be a better way using pandas. How would I do this in a fast and efficient way?

score 3 · Accepted Answer · answered Sep 09 '17 at 08:09

3

Given that 'Original' is df1 and 'New Measurement' is df2:

df3 = df1.set_index(['x', 'y'])
df3.update(df2.set_index(['x', 'y']))  # Inplace modificatioin on df3.
>>> df3.reset_index()
   x  y    z
0  1  1  1.5
1  1  2  0.5
2  1  3  0.5
3  2  1  0.5
4  2  2  1.0
5  2  3  0.5
6  3  1  0.5
7  3  2  0.5
8  3  3  0.5

answered Sep 09 '17 at 08:09

Alexander

105,104
32
201
196

Very nice solution! – Yorian Sep 09 '17 at 09:03

score 1 · Answer 2 · answered Sep 09 '17 at 07:55

You can use

merge on df1 and df2 with x, y keys
assign new column z with fillna using z_x, z_y
drop these unwanted columns

In [716]: (df1.merge(df2, on=['x', 'y'], how='left')
              .assign(z=lambda x: x.z_y.fillna(x.z_x))
              .drop(['z_x', 'z_y'], 1))
Out[716]:
   x  y    z
0  1  1  1.5
1  1  2  0.5
2  1  3  0.5
3  2  1  0.5
4  2  2  1.0
5  2  3  0.5
6  3  1  0.5
7  3  2  0.5
8  3  3  0.5

Details

In [717]: df1.merge(df2, on=['x', 'y'], how='left')
Out[717]:
   x  y  z_x  z_y
0  1  1  0.5  1.5
1  1  2  0.5  0.5
2  1  3  0.5  NaN
3  2  1  0.5  0.5
4  2  2  0.5  1.0
5  2  3  0.5  NaN
6  3  1  0.5  NaN
7  3  2  0.5  NaN
8  3  3  0.5  NaN

Could I get you to vote your conscience on my post? Thanks https://stackoverflow.com/a/46192213/2336654 — piRSquared, Sep 13 '17 at 08:19

score -1 · Answer 3 · answered Sep 09 '17 at 08:09

-1

original[(original.x == new.x) | (original.y == new.y)].z = new.z

answered Sep 09 '17 at 08:09

chrisckwong821

1,133
12
24

This doesn't work, it throws an error (besides the | needing to be an &) – Yorian Sep 09 '17 at 08:57

pandas two dataframes multiple column values comparison

3 Answers3