2

I have two lists of numpy vectors and wish to determine whether they represent approximately the same points (but possibly in a different order).

I've found methods such as numpy.testing.assert_allclose but it doesn't allow for possibly different orders. I have also found unittest.TestCase.assertCountEqual but that doesn't work with numpy arrays!

What is my best approach?

import unittest

import numpy as np

first = [np.array([20, 40]), np.array([20, 60])]
second = [np.array([19.8, 59.7]), np.array([20.1, 40.5])]

np.testing.assert_all_close(first, second, atol=2)  # Fails because the orders are different
unittest.TestCase.assertCountEqual(None, first, second)  # Fails because numpy comparisons evaluate element-wise; and because it doesn't allow a tolerance
Divakar
  • 218,885
  • 19
  • 262
  • 358
Jack B
  • 123
  • 6
  • Would all arrays in the two input lists be of the same shapes (including same number of dimensions)? Would there be equal number of arrays in the lists? – Divakar Apr 02 '17 at 17:41
  • Why not convert first and second to the format accepted by unittest.TestCase.assertCountEqual? – Filipe Aleixo Apr 02 '17 at 17:42
  • @Divakar - The arrays would have the same shapes (at least, if they don't then I would like the unittests they are part of to fail!) - There wouldn't necessarily be the same number of items in each list, since I am getting them from the arguments passed as calls to a mock object. However, the tests should fail if there are different numbers of elements. – Jack B Apr 02 '17 at 17:55
  • @FilipeAleixo What do you mean by converting them to the format accepted by `unittest.TestCase.assertCountEqual`? Even if I did convert them to tuples say, I don't know how I'd change it for an `assertCountApproxEqual`-esque method. – Jack B Apr 02 '17 at 17:58
  • In python `list` and `set` mean 2 different things. And `numpy` array is yet another thing. – hpaulj Apr 02 '17 at 18:03
  • @hpaulj Sorry, I'll fix the question – Jack B Apr 02 '17 at 18:04
  • Actually, sorry everyone - I've just realised that this is a lot harder / impossible. Indeed, what should it do if there are multiple ways of matching up the `numpy` vectors in each of the two `lists`? Any solution would have to iterate through a lot of combinations of possible matches between elements. I don't think I really want to get into that for the assertions I'm making - I'll find another way of phrasing the assertions! – Jack B Apr 02 '17 at 18:10
  • @JackB maybe not a wise approach but you could try to shuffle each list (keeping the other unchanged) a fixed number of times and check for `allclose`. If none of the times turns out to be true, then you could decide that they are indeed not close – kmario23 Apr 02 '17 at 18:12
  • @JackB What about giving some more feedback/upvoting and marking one answer as accepted? – Claudio Apr 09 '17 at 22:48

3 Answers3

0

A nice list iteration approach

In [1047]: res = []
In [1048]: for i in first:
      ...:     for j in second:
      ...:         diff = np.abs(i-j)
      ...:         if np.all(diff<2):
      ...:             res.append((i,j))            
In [1049]: res
Out[1049]: 
[(array([20, 40]), array([ 20.1,  40.5])),
 (array([20, 60]), array([ 19.8,  59.7]))]

Length of res is the number of matches.

Or as list comprehension:

def match(i,j):
    diff = np.abs(i-j)
    return np.all(diff<2)

In [1051]: [(i,j) for i in first for j in second if match(i,j)]
Out[1051]: 
[(array([20, 40]), array([ 20.1,  40.5])),
 (array([20, 60]), array([ 19.8,  59.7]))]

or with the existing array test:

[(i,j) for i in first for j in second if np.allclose(i,j, atol=2)]
hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • 1
    Can't see how this approach satisfies the requirement: "wish to determine whether they represent approximately the same points (**but possibly in a different order**)." – Claudio Apr 02 '17 at 21:38
  • What I've found are pairs that match. If there can't be duplicates, we can just compare the `len(res)` with `len(first)` and `len(second)`. If points can pair up several ways the logic will be more complex. – hpaulj Apr 02 '17 at 21:50
  • There are basically 2 issues - how do you check whether two of the list elements are `close`? and how do you judge whether the two lists have a full set of `close` points. While one can be expressed as an `np.allclose` test, the other is some sort of list or set combinatorics test. – hpaulj Apr 02 '17 at 22:33
0

Here you are :)

( idea based on Euclidean distance between points in two different Numpy arrays, not within )

import numpy as np
import scipy.spatial
first  = [np.array([20  , 60  ]), np.array([  20,   40])]
second = [np.array([19.8, 59.7]), np.array([20.1, 40.5])]

def pointsProximityCheck(firstListOfPoints, secondListOfPoints, distanceTolerance): 
    pointIndex  = 0
    maxDistance = 0
    lstIndices  = []
    for item in scipy.spatial.distance.cdist( firstListOfPoints, secondListOfPoints ):
        currMinDist = min(item)
        if currMinDist > maxDistance: 
            maxDistance = currMinDist
        if currMinDist < distanceTolerance :
            pass
        else:
            lstIndices.append(pointIndex)
            # print("point with pointIndex [", pointIndex, "] in the first list outside of Tolerance")
        pointIndex+=1
    return (maxDistance, lstIndices)

maxDistance, lstIndicesOfPointsOutOfTolerance = pointsProximityCheck(first, second, distanceTolerance=0.5)
print("maxDistance:", maxDistance, "indicesOfOutOfTolerancePoints", lstIndicesOfPointsOutOfTolerance )  

gives on output with distanceTolerance=0.5 :

maxDistance: 0.509901951359 indicesOfOutOfTolerancePoints [1]
Community
  • 1
  • 1
Claudio
  • 7,474
  • 3
  • 18
  • 48
0

but possibly in a different order

This is the key requirement. This problem can be treat as a classic problem in graph theory - finding perfect matching in unweighted bipartite graph. Hungarian Algorithm is a classic algo to solve this problem.

Here I implemented one.

import numpy as np

def is_matched(first, second):
    checked = np.empty((len(first),), dtype=bool)
    first_matching = [-1] * len(first)
    second_matching = [-1] * len(second)

    def find(i):
        for j, point in enumerate(second):
            if np.allclose(first[i], point, atol=2):
                if not checked[j]:
                    checked[j] = True
                    if second_matching[j] == -1 or find(second_matching[j]):
                        second_matching[j] = i
                        first_matching[i] = j
                        return True

    def get_max_matching():
        count = 0
        for i in range(len(first)):
            if first_matching[i] == -1:
                checked.fill(False)
                if find(i):
                    count += 1

        return count

    return len(first) == len(second) and get_max_matching() == len(first)

first = [np.array([20, 40]), np.array([20, 60])]
second = [np.array([19.8, 59.7]), np.array([20.1, 40.5])]
print(is_matched(first, second)) 
# True

first = [np.array([20, 40]), np.array([20, 60])]
second = [np.array([19.8, 59.7]), np.array([20.1, 43.5])]
print(is_matched(first, second))
# False
gzc
  • 8,180
  • 8
  • 42
  • 62