2

I have a few very large datasets with x, y and z values. These datasets represent interpolated height measurements in time. The first dataset (the original) contains the data for the entire area. In time parts of the area have been measured again. I want to overwrite the original dataset in the location where x and y are equal but z is different (the height has changed at location(x,y)).

So my dataframes look something like this

Original:

x    y    z
1    1    0.5
1    2    0.5
1    3    0.5
2    1    0.5
2    2    0.5
2    3    0.5
3    1    0.5
3    2    0.5
3    3    0.5

New measurement:

x    y    z
0    1    0.5
0    2    0.5
1    1    1.5
1    2    0.5
2    1    0.5
2    2    1.0

The final dataframe should look like this:

x    y    z
1    1    1.5
1    2    0.5
1    3    0.5
2    1    0.5
2    2    1.0
2    3    0.5
3    1    0.5
3    2    0.5
3    3    0.5

I can loop through all the measurements and see of the x and y occur in the original and if the z value is different (if so, replace it) but this takes forever and I can imagine that there must be a better way using pandas. How would I do this in a fast and efficient way?

Yorian
  • 2,002
  • 5
  • 34
  • 60

3 Answers3

3

Given that 'Original' is df1 and 'New Measurement' is df2:

df3 = df1.set_index(['x', 'y'])
df3.update(df2.set_index(['x', 'y']))  # Inplace modificatioin on df3.
>>> df3.reset_index()
   x  y    z
0  1  1  1.5
1  1  2  0.5
2  1  3  0.5
3  2  1  0.5
4  2  2  1.0
5  2  3  0.5
6  3  1  0.5
7  3  2  0.5
8  3  3  0.5
Alexander
  • 105,104
  • 32
  • 201
  • 196
1

You can use

merge on df1 and df2 with x, y keys
assign new column z with fillna using z_x, z_y
drop these unwanted columns

In [716]: (df1.merge(df2, on=['x', 'y'], how='left')
              .assign(z=lambda x: x.z_y.fillna(x.z_x))
              .drop(['z_x', 'z_y'], 1))
Out[716]:
   x  y    z
0  1  1  1.5
1  1  2  0.5
2  1  3  0.5
3  2  1  0.5
4  2  2  1.0
5  2  3  0.5
6  3  1  0.5
7  3  2  0.5
8  3  3  0.5

Details

In [717]: df1.merge(df2, on=['x', 'y'], how='left')
Out[717]:
   x  y  z_x  z_y
0  1  1  0.5  1.5
1  1  2  0.5  0.5
2  1  3  0.5  NaN
3  2  1  0.5  0.5
4  2  2  0.5  1.0
5  2  3  0.5  NaN
6  3  1  0.5  NaN
7  3  2  0.5  NaN
8  3  3  0.5  NaN
Zero
  • 74,117
  • 18
  • 147
  • 154
-1
original[(original.x == new.x) | (original.y == new.y)].z = new.z
chrisckwong821
  • 1,133
  • 12
  • 24