3

I have two numpy arrays of unequal lengths:

a = numpy.array([108, 637, 1172, 1304, 2260, 2809])
b = numpy.array([109, 634, 2254, 2814])

I want to shorten a such that the corresponding elements in each array are similar. The criteria for this correspondence is when the element of b lies within the range: element a - 50 < element b < element a + 50. Hence, the element with value 108 from a is a match to the element with value 109 from b. The resulting output should be:

a_prime = numpy.array([108, 637, 2260, 2809])
b_prime = numpy.array([109, 634, 2254, 2814])

I can achieve this using a double for loop configuration:

a_prime = numpy.zeros(b.shape[0], dtype = int)
b_prime = numpy.copy(b)

for idx, element_b in enumerate(b):
  for element_a in a:
    if (element_a - 50) < element_b < (element_a + 50):
      a_prime[idx] = element_a

However, for large array lengths this will be very time consuming. What would be the fast and more pythonic way to achieve the same result?

The Dude
  • 3,795
  • 5
  • 29
  • 47

1 Answers1

3

Here's one way to obtain a_prime, where the closest values in a to a given value in b are set to the same index, regardless of the threshold, as you're looking for the closest values after all:

a_prime = a[np.abs(np.subtract.outer(b,a)).argmin(1)]
# array([ 108,  637, 2260, 2809])

Where np.substract.outer will give you the difference of each value in b with every other value in a, and taking its absolute value gives:

x = np.abs(np.subtract.outer(b,a))
print(x)
array([[   1,  528, 1063, 1195, 2151, 2700],
       [ 526,    3,  538,  670, 1626, 2175],
       [2146, 1617, 1082,  950,    6,  555],
       [2706, 2177, 1642, 1510,  554,    5]])

Now we only need the argmin value of each row, and to use it to index a:

x.argmin(1)
# array([0, 1, 4, 5], dtype=int64)
yatu
  • 86,083
  • 12
  • 84
  • 139