2

Let's say I have 2 arrays like these:

x1 = [ 1.2,  1.8,  2.3,  4.5, 20.0]
y1 = [10.3, 11.8, 12.3, 11.5, 11.5]

and other two that represent the same function but sampled in different values

x2 = [ 0.2,  1,8,  5.3, 15.5, 17.2, 18.3, 20.0]
y2 = [10.3, 11.8, 12.3, 12.5, 15.2, 10.3, 10.0]

is there a way with numpy to merge x1 and x2 and according to the result merging also the related values of y without explicitly looping all over the arrays? (like doing an average of y or taking the max for that interval)

kmario23
  • 57,311
  • 13
  • 161
  • 150
D.Giunchi
  • 1,900
  • 3
  • 19
  • 23
  • Do you have an expected output? `x = x1 + x2; y = y1 + y2`?? – knh190 Mar 30 '19 at 09:58
  • What about interpolation / extrapolation? I mean if you have two sets of measurements... most likely they will be slightly different. Going that direction - is it ok to check only x-es, or maybe you wish to validate point? Other question is how big difference should be to say that two things are different? – Michał Zaborowski Mar 30 '19 at 10:02
  • ... and points that are interpolated / extrapolated with linear function are different then extrapolating with other functions ... – Michał Zaborowski Mar 30 '19 at 10:03
  • Since your two examples both have `20.0` as x-values but have different y-values for `20.0`, how can the two arrays represent the same function? This is just one way your question is not clear. Please add more explanation and handle the `20.0` issue. – Rory Daulton Mar 30 '19 at 10:44
  • yeah sorry for not remarking that y should be interpolated in some way. So concatenation and merge are good for x, but y should change according to a rule (like average, or max or some custom thing) – D.Giunchi Mar 30 '19 at 18:36

3 Answers3

2

I don't know if you can find something in numpy, but here is a solution using pandas instead. (Pandas is using numpy behind the scenes, so there isn't so much data conversion.)

import numpy as np 
import pandas as pd 
x1 = np.asarray([ 1.2,  1.8,  2.3,  4.5, 20.0])
y1 = np.asarray([10.3, 11.8, 12.3, 11.5, 11.5])
x2 = np.asarray([ 0.2,  1.8,  5.3, 15.5, 17.2, 18.3, 20.0])
y2 = np.asarray([10.3, 11.8, 12.3, 12.5, 15.2, 10.3, 10.0])
c1 = pd.DataFrame({'x': x1, 'y': y1})
c2 = pd.DataFrame({'x': x2, 'y': y2})
c = pd.concat([c1, c2]).groupby('x').mean().reset_index()
x = c['x'].values
y = c['y'].values

# Result:
x = array([ 0.2,  1.2,  1.8,  2.3,  4.5,  5.3,  15.5, 17.2, 18.3, 20. ])
y = array([10.3 , 10.3, 11.8, 12.3, 11.5, 12.3, 12.5, 15.2, 10.3, 10.75])

Here I concatenate the two vectors and do a groupby operation to get the equal values for 'x'. For these "groups" I than take the mean(). reset_index() will than move the index 'x' back to a column. To get the result back as a numpy array I use .values. (Use to_numpy() for pandas version 24.0 and higher.)

Hielke Walinga
  • 2,677
  • 1
  • 17
  • 30
0

How about using numpy.hstack followed by sorting using numpy.sort ?

In [101]: x1_arr = np.array(x1)
In [102]: x2_arr = np.array(x2)

In [103]: y1_arr = np.array(y1)
In [104]: y2_arr = np.array(y2)


In [111]: np.sort(np.hstack((x1_arr, x2_arr)))
Out[111]: 
array([ 0.2,  1.2,  1.8,  1.8,  2.3,  4.5,  5.3, 15.5, 17.2, 18.3, 20. ,
       20. ])

In [112]: np.sort(np.hstack((y1_arr, y2_arr)))
Out[112]: 
array([10. , 10.3, 10.3, 10.3, 11.5, 11.5, 11.8, 11.8, 12.3, 12.3, 12.5,
       15.2])

If you want to get rid of the duplicates, you can apply numpy.unique on top of the above results.

kmario23
  • 57,311
  • 13
  • 161
  • 150
0

I'd propose a solution based on the accepted answer of this question:

import numpy as np
import pylab as plt

x1 = [1.2, 1.8, 2.3, 4.5, 20.0]
y1 = [10.3, 11.8, 12.3, 11.5, 11.5]

x2 = [0.2, 1.8, 5.3, 15.5, 17.2, 18.3, 20.0]
y2 = [10.3, 11.8, 12.3, 12.5, 15.2, 10.3, 10.0]

# create a merged and sorted x array
x = np.concatenate((x1, x2))
ids = x.argsort(kind='mergesort')
x = x[ids]

# find unique values
flag = np.ones_like(x, dtype=bool)
np.not_equal(x[1:], x[:-1], out=flag[1:])

# discard duplicated values
x = x[flag]

# merge, sort and select values for y
y = np.concatenate((y1, y2))[ids][flag]

plt.plot(x, y, marker='s', color='b', ls='-.')
plt.xlabel('x')
plt.ylabel('y')

plt.show()

This is the result: enter image description here

x = [ 0.2  1.2  1.8  2.3  4.5  5.3 15.5 17.2 18.3 20. ] 
y = [10.3 10.3 11.8 12.3 11.5 12.3 12.5 15.2 10.3 11.5]

As you notice, this code keeps only one value for y if several ones are available for the same x: in this way, the code is faster.

Bonus solution: the following solution is based on a loop and mainly standard Python functions and objects (not numpy), so I known that it is not acceptable; by the way, it is very coincise and elegant and it handles multiple values for y, so I decied to include it here as a plus:

x = sorted(set(x1 + x2))
y = np.nanmean([[d.get(i, np.nan) for i in x] 
                for d in map(lambda a: dict(zip(*a)), ((x1, y1), (x2, y2)))], axis=0)

In this case, you get the following results:

x = [0.2, 1.2, 1.8, 2.3, 4.5, 5.3, 15.5, 17.2, 18.3, 20.0] 
y = [10.3  10.3  11.8  12.3  11.5  12.3  12.5  15.2  10.3  10.75]
PieCot
  • 3,564
  • 1
  • 12
  • 20