I have four 1D np.array
s: x1, y1, x2, y2
, where x1
and y2
has the same length, also x2
and y2
has the same length, since they are corresponding x and y values for a dataset. len(x1)
and len(x2)
are always different. Let's assume len(x1) > len(x2)
for now. These two arrays always have common values, but in a special way: the values are not exactly the same, only within a tolerance (because of numerical errors, etc.). Example with tolerance = 0.01:
x1 = np.array([0, 1.01, 1.09, 1.53, -9.001, 1.2, -52, 1.011])
x2 = np.array([1, 1.1, 1.2, 1.5, -9, 82])
I want to keep only the common values (in the tolerance manner). Use the shorter array for reference, which is x2
in this case. The first value in x2
is 1
, and has a corresponding value in x1
, which is 1.01
. Next: 1.2
has also a corresponding value in x2
, 1.2
. The value 1.5
has no corresponding value, because 1.53
is out of tolerance, so filter it out, etc..
The full result should be:
x1 = np.array([1.01, 1.09, -9.001, 1.2])
x2 = np.array([1, 1.1, -9, 1.2])
To bring this one step further, based on filtering the x values this way I want to filter the y values for the same indices for both datasets, so in other words I want to find the longest common subsequence of two datasets. Note that ordering is important here because of the connection with the y values (it doesn't matter if we argsort x, and reindex x and y with that first).
What I have tried based on this answer:
def longest_common_subseq(x1, x2, y1, y2, tol=0.02):
# sort them first to keep x and y connected
idx1 = np.argsort(x1)
x1, y1 = x1[idx1], y1[idx1]
idx2 = np.argsort(x2)
x2, y2 = x2[idx2], y2[idx2]
# here I assumed that len(x2) < len(x1)
idx = (np.abs(x1[:,None] - x2) <= tol).any(axis=1)
return x1[idx], x2[idx], y1[idx], y2[idx]
the y values can be arbitrary in this case, only the shapes must match with x1
and x2
. For example:
y1 = np.array([0, 1, 2, 3, 4, 5, 6, 7])
y2 = np.array([-1, 0, 3, 7, 11, -2])
Trying to run the function above raises
IndexError: boolean index did not match indexed array along dimension 0
.
I understand: The index array's length is wrong because x1
and x2
have different length, and so far I couldn't do it. Is there a nice way to achieve this?
EDIT:
If multiple values are inside the tolerance, the closest should be selected.