4

I want to do nose testing of numpy arrays of complex floats that are unordered.

So for instance, if

a = [1+1j, 1-1j, 2+2j, 2-2j, 2+2j, 2-2j]

and

b = [2+2j, 2-2j, 1+1j, 1.000000000000001-1j, 2+2j, 2-2j]

the assert should succeed, as they have approximately the same values the same number of times. Order doesn't matter.

For regular floats, assert_array_almost_equal(np.sort(a), np.sort(b)) would be fine, but that doesn't work here because it sorts by real part first and then imaginary part, so because of the float error, they are sorted to:

a: [ 1.-1.j,  1.+1.j,  2.-2.j,  2.-2.j,  2.+2.j,  2.+2.j]    
b: [ 1.+1.j,  1.-1.j,  2.-2.j,  2.-2.j,  2.+2.j,  2.+2.j]

Is there a built-in way to do this? If not, I guess I could resort to something like cplxreal, but that seems like a lot to add to a testing file.

endolith
  • 25,479
  • 34
  • 128
  • 192
  • It seems like this could be NP-complete if you try to strictly follow a spec for the results, but not too hard if you're okay with the results getting a bit fuzzy in the crazy cases. – user2357112 Jan 04 '14 at 02:30
  • The first algorithm that comes to mind is some sort of greedy nearest-neighbor matching, where you successively match each element of one set to its nearest unmatched neighbor in the other, stopping when you run out or the nearest neighbor is too far away. It seems like there could likely be something more efficient, but in any case, it's not going to be as simple as a sort. – user2357112 Jan 04 '14 at 02:37
  • 1
    You could try to remove the floating point error, eg, truncate to x number of significant figures (where x is determined by your applications required precision) and then compare. I doubt that will be very efficient thought, and is not a built-in solution. – three_pineapples Jan 04 '14 at 02:51

4 Answers4

2

How about sorting the arrays by their magnitude?

def foo(a):
    return a[np.argsort(a*a.conjugate())]
np.testing.assert_array_almost_equal(foo(a),foo(b))
hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • Not quite good enough, as we incorrectly see the difference of `[-.9, 1.1]` and `[.9, -1.1]` as `[1.8, 2.2]`. But after sorting by magnitude you could iterate both arrays more efficiently, so it's a good first step. – U2EF1 Jan 04 '14 at 05:41
  • 1
    In the context of `nose` tests, you probably don't need to account for every possible ordering. Just pick reference values that have distinct magnitudes. The goal is to have an ordering that isn't affect by small errors off in the nth decimal place. – hpaulj Jan 04 '14 at 06:38
  • Doh. I just realized that sorting by imaginary part would work for probably all of my tests. Magnitude would not work. Angle would work for half. Maybe edit this answer to "if you know the shape of the numbers, use angle or magnitude or imaginary part to sort them". Doesn't work in the general case, but does if you already know the answer. – endolith Jan 04 '14 at 08:05
  • Yes, I looked again are your example. The values differ more angle than by magnitude. But watch out for the periodicity of angles (0==2*pi) – hpaulj Jan 04 '14 at 23:01
  • @hpaulj: My example is more for the general case. My actual application has them arranged in a predictable way. So I'm currently using either `assert_allclose(sorted(z, key=np.imag), sorted(z2, key=np.imag))` or `assert_allclose(sorted(z, key=np.angle), sorted(z2, key=np.angle))` depending on how the points are arranged on the complex plane. – endolith Jan 05 '14 at 13:58
1

I'm not sure if there's any way to do this in a better than O(n^2) worst case, but if that's acceptable to you, you could just copy one of the lists and use a modified equals function with elimination to see if they match.

def equals(a, b, tolerance):
    return abs(a-b) < tolerance

and then iterate through your list, removing matches as you find them

def equivalent_lists(a, b, tolerance):
    new_b = b[:]
    for a_item in a:
        truths = [equals(a_item, b_item, tolerance) for b_item in new_b]
        if not any(truths):
            return False
        else:
            new_b.pop(truths.index(True))
    return not bool(new_b)

Seems to work in your initial case, in at least a cursory way:

a = [1+1j, 1-1j, 2+2j, 2-2j, 2+2j, 2-2j]
b = [2+2j, 2-2j, 1+1j, 1.000000000000001-1j, 2+2j, 2-2j]
c = [2+2j, 2-2j, 1+1j, 2-1j, 2+2j, 2-2j]

equivalent_lists(a, b, 0.0001)
>>> True
equivalent_lists(a, c, 0.0001)
>>> False

Not the most beautiful solution, but at least seems to work in a pretty transparent way.

Slater Victoroff
  • 21,376
  • 21
  • 85
  • 144
1

I encountered this problem recently and realized that it can be accomplished quite efficiently by checking whether the structural rank of the matrix of pairwise comparisons is equal to the size of the inputs.

According to scipy.csgraph.structural_rank "A graph has full structural rank if it is possible to permute the elements to make the diagonal zero-free." This is exactly the condition we want: we want to check if the pairwise comparison can be permuted such that the result has a fully nonzero diagonal.

You can implement it like this:

import numpy as np
from scipy.sparse import csgraph, csr_matrix

def sets_approx_equal(x, y, **kwds):
  """
  Check if x and y are permutations of the same approximate set of values.
  """
  x = np.asarray(x)
  y = np.asarray(y)
  assert x.ndim == y.ndim == 1
  if len(x) != len(y):
    return False
  mat = np.isclose(x[:, None], y[None, :], **kwds)
  rank = csgraph.structural_rank(csr_matrix(mat))
  return rank == len(y)

Testing on your example inputs gives the expected results:

a = [1+1j, 1-1j, 2+2j, 2-2j, 2+2j, 2-2j]
b = [2+2j, 2-2j, 1+1j, 1.000000000000001-1j, 2+2j, 2-2j]
print(sets_approx_equal(a, b))
# True

b[-1] = 1 - 1j
print(sets_approx_equal(a, b))
# False

To convince ourselves further that this is working, we can generate a large number of known equal and unequal permutations, and check that the function is returning the expected result for these cases:

def make_permutations(rng, N, equal=True, eps=1E-10):
  """Make two permuted arrays of complex values.

  If equal=True, they are derived from the same initial set of values.
  If equal=False, they are not
  """
  x = rng.randn(N) + 1j * rng.randn(N)
  y = x if equal else rng.randn(N) + 1j * rng.randn(N)
  x = x + eps * (rng.randn(N) + 1j * rng.randn(N))
  y = y + eps * (rng.randn(N) + 1j * rng.randn(N))
  return rng.permutation(x), rng.permutation(y)

rng = np.random.RandomState(0)
for i in range(1000):
  x, y = make_permutations(rng, 5, equal=True)
  assert sets_approx_equal(x, y)

  x, y = make_permutations(rng, 5, equal=False)
  assert not sets_approx_equal(x, y)

Indeed, all the assertions pass.

jakevdp
  • 77,104
  • 11
  • 125
  • 160
0

Another approach might be to think in terms of points in a 2d space. a is a set of points. b is another set. Each point in b should be close to a point in a.

diff = np.array(a)[None,:]-np.array(b)[:,None]
X = np.round(diff*diff.conjugate())==0  # round to desired error tollerance
X

array([[False, False,  True, False,  True, False],
       [False, False, False,  True, False,  True],
       [ True, False, False, False, False, False],
       [False,  True, False, False, False, False],
       [False, False,  True, False,  True, False],
       [False, False, False,  True, False,  True]], dtype=bool)

The next step is to test X. If values in a (and b) are distinct, there should be one True for each column and each row. But here there are duplicates.

In [29]: I,J=np.where(X)

In [30]: I
Out[30]: array([0, 0, 1, 1, 2, 3, 4, 4, 5, 5])

In [31]: J
Out[31]: array([2, 4, 3, 5, 0, 1, 2, 4, 3, 5])

set(I) and set(J) are both [0,1,..5], so all rows and columns have a match. This set test might be enough, even with duplicates. I'd have to explore other combinations of points and mismatches.

This answer is incomplete, but might provide some useful ideas.

hpaulj
  • 221,503
  • 14
  • 230
  • 353