How to find closest elements in two array?

Question

I have two numpy arrays, like X=[x1,x2,x3,x4], y=[y1,y2,y3,y4]. Three of the elements are close and the fourth of them maybe close or not.

Like:

X   [ 84.04467948  52.42447842  39.13555678  21.99846595]
y   [ 78.86529444  52.42447842  38.74910101  21.99846595]

Or it can be:

X   [ 84.04467948  60  52.42447842  39.13555678]
y   [ 78.86529444  52.42447842  38.74910101  21.99846595]

I want to define a function to find the the corresponding index in the two arrays, like in first case:

y[0] correspond to X[0],
y[1] correspond to X[1],
y[2] correspond to X[2],
y[3] correspond to X[3]

And in second case:

y[0] correspond to X[0],
y[1] correspond to X[2],
y[2] correspond to X[3]
and y[3] correspond to X[1].

I can't write a function to solve the problem completely, please help.

Simple quadratic-complexity approach: for each element in X, search for nearest y in Y, mark as already taken (can't be selected again) and go on. — sascha, Aug 22 '16 at 11:19
@sascha your approach doesn't work for the second example. Assume that you're on 60. The nearest number in another array is 52. So you take it and nothing else can't now be paired to it, but it's actually not a good solution since 60 should be paired with 21 (and 52 should be paired with another 52 in first array). — KaroCodes, Aug 22 '16 at 11:26
Hi, sascha, thanks for you answer but what will happen when you select first element in X is "60" in the second case, it will be correspond to "52.42" which is wrong. And I don't want to correspond them by size. — insomnia, Aug 22 '16 at 11:27
@grael Hi, What I can get is to do a loop in three of X and three of Y and get the closest pair . But it waste too much time... — insomnia, Aug 22 '16 at 11:28
@insomnia Then you need to define some error-model. What is the error? L1-norm? L2-norm? It behaves different for all the approaches. Alternatively: just precompute all pairwise-diffs, sort these and loop through! — sascha, Aug 22 '16 at 11:28
@eiKatte Thanks your comment. It's the point and it's the difficulty. — insomnia, Aug 22 '16 at 11:29
how about this: calculate the min differences between all `x` and all `y` terms. get the one with the overall minimum difference, pop the respective items off their lists and repeat. a recursive function. — Ma0, Aug 22 '16 at 11:29

score 5 · Answer 1 · edited May 23 '17 at 12:15

You can start by precomputing the distance matrix as show in this answer:

import numpy as np

X = np.array([84.04467948,60.,52.42447842,39.13555678])
Y = np.array([78.86529444,52.42447842,38.74910101,21.99846595])

dist = np.abs(X[:, np.newaxis] - Y)

Now you can compute the minimums along one axis (I chose 1 corresponding to finding the closest element of Y for every X):

potentialClosest = dist.argmin(axis=1)

This still may contain duplicates (in your case 2). To check for that, you can find find all Y indices that appear in potentialClosest by use of np.unique:

closestFound, closestCounts = np.unique(potentialClosest, return_counts=True)

Now you can check for duplicates by checking if closestFound.shape[0] == X.shape[0]. If so, you're golden and potentialClosest will contain your partners for every element in X. In your case 2 though, one element will occur twice and therefore closestFound will only have X.shape[0]-1 elements whereas closestCounts will not contain only 1s but one 2. For all elements with count 1 the partner is already found. For the two candidates with count 2, though you will have to choose the closer one while the partner of the one with the larger distance will be the one element of Y which is not in closestFound. This can be found as:

missingPartnerIndex = np.where(
        np.in1d(np.arange(Y.shape[0]), closestFound)==False
        )[0][0]

You can do the matchin in a loop (even though there might be some nicer way using numpy). This solution is rather ugly but works. Any suggestions for improvements are very appreciated:

partners = np.empty_like(X, dtype=int)
nonClosePartnerFound = False
for i in np.arange(X.shape[0]):
    if closestCounts[closestFound==potentialClosest[i]][0]==1:
        # A unique partner was found
        partners[i] = potentialClosest[i]
    else:
        # Partner is not unique
        if nonClosePartnerFound:
            partners[i] = potentialClosest[i]
        else:
            if np.argmin(dist[:, potentialClosest[i]]) == i:
                partners[i] = potentialClosest[i]
            else:
                partners[i] = missingPartnerIndex
                nonClosePartnerFound = True
print(partners)

This answer will only work if only one pair is not close. If that is not the case, you will have to define how to find the correct partner for multiple non-close elements. Sadly it's neither a very generic nor a very nice solution, but hopefully you will find it a helpful starting point.

Thanks a lot. I appreciate with your idea and it's very complete. Let me check it. — insomnia, Aug 22 '16 at 14:17
@insomnia No Problem. It should work as long as there is only one not-matching pair. — jotasi, Aug 22 '16 at 14:20
@insomnia To get it working, you basically have to copy paste all code parts together. The only thing missing is the if-clause mentioned in the middle to check, whether you are actually in case 1. If I should clarify something, please let me know. — jotasi, Aug 22 '16 at 18:51

score 3 · Answer 2 · edited May 23 '17 at 12:33

3

Using this answer https://stackoverflow.com/a/8929827/3627387 and https://stackoverflow.com/a/12141207/3627387

FIXED

def find_closest(alist, target):
    return min(alist, key=lambda x:abs(x-target))

X = [ 84.04467948,  52.42447842,  39.13555678,  21.99846595]
Y = [ 78.86529444,  52.42447842,  38.74910101,  21.99846595]

def list_matching(list1, list2):
    list1_copy = list1[:]
    pairs = []
    for i, e in enumerate(list2):
        elem = find_closest(list1_copy, e)
        pairs.append([i, list1.index(elem)])
        list1_copy.remove(elem)
    return pairs

edited May 23 '17 at 12:33

Community

1
1

answered Aug 22 '16 at 11:39

Sardorbek Imomaliev

14,861
2
51
63

This allows multiple match-uses for each element in y, which might be ok or not. It's quite a non-symmetric algorithm in that way (each X only used once, but not necessarily in Y). – sascha Aug 22 '16 at 11:41
1

@sascha Fixed, but it will work like you are checking second list against first. I think you can update it to work more smart. – Sardorbek Imomaliev Aug 22 '16 at 12:00
@SardorbekImomaliev Sry for a late reply, how about a=[1,2,3,6] and b=[7,2,3,6]. It will lead a wrong result. But I think adding a sort will solve this problem. And in fact in my case, your code is good enough. Thanks a lot. – insomnia Aug 22 '16 at 14:03
Actually, I don't think such a approach can work for all cases, as you always can reach the "not close to any"-element in your enumeration and then match it to the nearest value, which actually would match to another value, regardless which list you iterate over. – jotasi Aug 22 '16 at 14:18
I have to provide another exception. I use sorted array, but the result is strange, X [ 80.06192623 51.27128419 33.81534928 25.49749915], y[ 73.26784071 51.27128419 26.60918437 25.49749915] and output is [[0, 0], [1, 1], [2, 3], [3, 2]]. But the good news is that if I sort the array from small to big, things are better. – insomnia Aug 22 '16 at 14:59

score 2 · Answer 3 · answered Aug 22 '16 at 11:24

2

Seems like best approach would be to pre-sort both array (nlog(n)) and then perform merge-like traverse through both arrays. It's definitely faster than nn which you indicated in comment.

answered Aug 22 '16 at 11:24

nimdil

1,361
10
20

Thanks a lot. But I don't even know what's merge-like traverse...But I agree with you that pre-sort is useful. – insomnia Aug 22 '16 at 14:10
Well if you look like into merge sort like here: https://en.wikipedia.org/wiki/Merge_sort#Top-down_implementation_using_lists you see that it's essentially splitting operation and merging operation - the latter is what is ussually behind merge operation. The core idea is that you go in a loop through both lists advancing the index in each iteration of the list that has current lower value (you keep 2 indices). – nimdil Aug 22 '16 at 14:19

Elliot Ledger · Answer 4 · 2016-08-23T16:07:25.000

The below simply prints the corresponding indexes of the two arrays as you have done in your question as I'm not sure what output you want your function to give.

X1 = [84.04467948, 52.42447842, 39.13555678, 21.99846595]
Y1 = [78.86529444, 52.42447842, 38.74910101, 21.99846595]

X2 = [84.04467948, 60, 52.42447842, 39.13555678]
Y2 = [78.86529444, 52.42447842, 38.74910101, 21.99846595]

def find_closest(x_array, y_array):
    # Copy x_array as we will later remove an item with each iteration and
    # require the original later
    remaining_x_array = x_array[:]
    for y in y_array:
        differences = []
        for x in remaining_x_array:
            differences.append(abs(y - x))
        # min_index_remaining is the index position of the closest x value
        # to the given y in remaining_x_array
        min_index_remaining = differences.index(min(differences))
        # related_x is the closest x value of the given y
        related_x = remaining_x_array[min_index_remaining]
        print 'Y[%s] corresponds to X[%s]' % (y_array.index(y), x_array.index(related_x))
        # Remove the corresponding x value in remaining_x_array so it
        # cannot be selected twice
        remaining_x_array.pop(min_index_remaining)

This then outputs the following

find_closest(X1,Y1)
Y[0] corresponds to X[0]
Y[1] corresponds to X[1]
Y[2] corresponds to X[2]
Y[3] corresponds to X[3]

and

find_closest(X2,Y2)
Y[0] corresponds to X[0]
Y[1] corresponds to X[2]
Y[2] corresponds to X[3]
Y[3] corresponds to X[1]

Hope this helps.

How to find closest elements in two array?

4 Answers4