1

Is there a fast way of comparing rows for equivalency in an ndarray in Python 2.7? I am applying a symmetry operation on some coordinates that I am storing in each row of an array of shape (N,4). I need a way to tell if my transformation maps coordinates back to equivalent positions. The caveat is that even though the positions may be the same, they are stored in different rows of the array so this requires sorting the arrays prior to comparison. This is fine if I just needed to call it once, but this function is called ~10,000 times in my code.

Benchmarking this shows that this takes ~60 μs:

%timeit structs_are_equiv_old(a,b)
The slowest run took 6.36 times longer than the fastest. This could mean that     
an intermediate result is being cached.
10000 loops, best of 3: 59.6 µs per loop

Is there a way to speed up this type of comparison?

def structs_are_equiv(a, b):
    """
    compares two numpy arrays row by row to determine if they contain the   
    coordinates after the application of a transformation operation.
    """
    assert a.shape == b.shape
    a_temp = a[ np.lexsort( (a[:,3], a[:,2], a[:,1], a[:,0]) ) ]
    b_temp = b[ np.lexsort( (b[:,3], b[:,2], b[:,1], b[:,0]) ) ]

    return np.allclose( a_temp, b_temp )

Example a and b (note the first column is not involved in the transformation, just a way to denote the type of object stored at the coordinate):

a = array([[ 1.      ,  0.      ,  0.5     ,  0.271149],
   [ 1.      ,  0.5     ,  0.5     ,  0.271149],
   [ 1.      ,  0.      ,  0.      ,  0.303063],
   [ 1.      ,  0.5     ,  0.      ,  0.303063],
   [ 2.      ,  0.25    ,  0.      ,  0.358071],
   [ 2.      ,  0.75    ,  0.      ,  0.358071],
   [ 1.      ,  0.25    ,  0.306215,  0.358071],
   [ 1.      ,  0.75    ,  0.306215,  0.358071],
   [ 2.      ,  0.      ,  0.5     ,  0.358071],
   [ 2.      ,  0.5     ,  0.5     ,  0.358071],
   [ 1.      ,  0.25    ,  0.693785,  0.358071],
   [ 1.      ,  0.75    ,  0.693785,  0.358071],
   [ 1.      ,  0.      ,  0.      ,  0.413078],
   [ 1.      ,  0.5     ,  0.      ,  0.413078],
   [ 1.      ,  0.      ,  0.5     ,  0.444992],
   [ 1.      ,  0.5     ,  0.5     ,  0.444992],
   [ 2.      ,  0.      ,  0.      ,  0.5     ],
   [ 2.      ,  0.5     ,  0.      ,  0.5     ],
   [ 1.      ,  0.25    ,  0.193785,  0.5     ],
   [ 1.      ,  0.75    ,  0.193785,  0.5     ],
   [ 2.      ,  0.25    ,  0.5     ,  0.5     ],
   [ 2.      ,  0.75    ,  0.5     ,  0.5     ],
   [ 1.      ,  0.25    ,  0.806215,  0.5     ],
   [ 1.      ,  0.75    ,  0.806215,  0.5     ],
   [ 1.      ,  0.      ,  0.5     ,  0.555008],
   [ 1.      ,  0.5     ,  0.5     ,  0.555008],
   [ 1.      ,  0.      ,  0.      ,  0.586922],
   [ 1.      ,  0.5     ,  0.      ,  0.586922],
   [ 2.      ,  0.25    ,  0.      ,  0.641929],
   [ 2.      ,  0.75    ,  0.      ,  0.641929],
   [ 1.      ,  0.25    ,  0.306215,  0.641929],
   [ 1.      ,  0.75    ,  0.306215,  0.641929],
   [ 2.      ,  0.      ,  0.5     ,  0.641929],
   [ 2.      ,  0.5     ,  0.5     ,  0.641929],
   [ 1.      ,  0.25    ,  0.693785,  0.641929],
   [ 1.      ,  0.75    ,  0.693785,  0.641929],
   [ 1.      ,  0.      ,  0.      ,  0.696937],
   [ 1.      ,  0.5     ,  0.      ,  0.696937],
   [ 1.      ,  0.      ,  0.5     ,  0.728851],
   [ 1.      ,  0.5     ,  0.5     ,  0.728851],
   [ 0.      ,  0.117635,  0.5     ,  0.238728],
   [ 0.      ,  0.617635,  0.5     ,  0.238728],
   [ 0.      ,  0.      ,  0.114216,  0.270642],
   [ 0.      ,  0.5     ,  0.114216,  0.270642],
   [ 0.      ,  0.      ,  0.      ,  0.270642],
   [ 0.      ,  0.5     ,  0.      ,  0.270642],
   [ 0.      ,  0.617635,  0.5     ,  0.761272],
   [ 0.      ,  0.117635,  0.5     ,  0.761272],
   [ 0.      ,  0.5     ,  0.114216,  0.729358],
   [ 0.      ,  0.      ,  0.114216,  0.729358],
   [ 0.      ,  0.5     ,  0.      ,  0.729358],
   [ 0.      ,  0.      ,  0.      ,  0.729358],
   [ 0.      ,  0.25    ,  0.306215,  0.401299],
   [ 0.      ,  0.75    ,  0.306215,  0.401299],
   [ 0.      ,  0.25    ,  0.693785,  0.401299],
   [ 0.      ,  0.75    ,  0.693785,  0.401299],
   [ 0.      ,  0.25    ,  0.306215,  0.598701],
   [ 0.      ,  0.75    ,  0.306215,  0.598701],
   [ 0.      ,  0.25    ,  0.693785,  0.598701],
   [ 0.      ,  0.75    ,  0.693785,  0.598701],
   [ 0.      ,  0.117635,  0.5     ,  0.226923],
   [ 0.      ,  0.117635,  0.5     ,  0.773077],
   [ 0.      ,  0.      ,  0.114216,  0.260279],
   [ 0.      ,  0.      ,  0.114216,  0.739721],
   [ 0.      ,  0.      ,  0.885784,  0.260279],
   [ 0.      ,  0.      ,  0.885784,  0.739721],
   [ 0.      ,  0.5     ,  0.885784,  0.260279],
   [ 0.      ,  0.5     ,  0.885784,  0.739721],
   [ 0.      ,  0.25    ,  0.306215,  0.401299],
   [ 0.      ,  0.25    ,  0.306215,  0.598701],
   [ 0.      ,  0.75    ,  0.306215,  0.401299],
   [ 0.      ,  0.75    ,  0.306215,  0.598701],
   [ 0.      ,  0.75    ,  0.693785,  0.401299],
   [ 0.      ,  0.75    ,  0.693785,  0.598701]])

b = nparray([[ 1.      ,  0.5     ,  0.5     ,  0.271149],
   [ 1.      ,  0.      ,  0.5     ,  0.271149],
   [ 1.      ,  0.5     ,  0.      ,  0.303063],
   [ 1.      ,  0.      ,  0.      ,  0.303063],
   [ 2.      ,  0.75    ,  0.      ,  0.358071],
   [ 2.      ,  0.25    ,  0.      ,  0.358071],
   [ 1.      ,  0.75    ,  0.306215,  0.358071],
   [ 1.      ,  0.25    ,  0.306215,  0.358071],
   [ 2.      ,  0.5     ,  0.5     ,  0.358071],
   [ 2.      ,  0.      ,  0.5     ,  0.358071],
   [ 1.      ,  0.75    ,  0.693785,  0.358071],
   [ 1.      ,  0.25    ,  0.693785,  0.358071],
   [ 1.      ,  0.5     ,  0.      ,  0.413078],
   [ 1.      ,  0.      ,  0.      ,  0.413078],
   [ 1.      ,  0.5     ,  0.5     ,  0.444992],
   [ 1.      ,  0.      ,  0.5     ,  0.444992],
   [ 2.      ,  0.5     ,  0.      ,  0.5     ],
   [ 2.      ,  0.      ,  0.      ,  0.5     ],
   [ 1.      ,  0.75    ,  0.193785,  0.5     ],
   [ 1.      ,  0.25    ,  0.193785,  0.5     ],
   [ 2.      ,  0.75    ,  0.5     ,  0.5     ],
   [ 2.      ,  0.25    ,  0.5     ,  0.5     ],
   [ 1.      ,  0.75    ,  0.806215,  0.5     ],
   [ 1.      ,  0.25    ,  0.806215,  0.5     ],
   [ 1.      ,  0.5     ,  0.5     ,  0.555008],
   [ 1.      ,  0.      ,  0.5     ,  0.555008],
   [ 1.      ,  0.5     ,  0.      ,  0.586922],
   [ 1.      ,  0.      ,  0.      ,  0.586922],
   [ 2.      ,  0.75    ,  0.      ,  0.641929],
   [ 2.      ,  0.25    ,  0.      ,  0.641929],
   [ 1.      ,  0.75    ,  0.306215,  0.641929],
   [ 1.      ,  0.25    ,  0.306215,  0.641929],
   [ 2.      ,  0.5     ,  0.5     ,  0.641929],
   [ 2.      ,  0.      ,  0.5     ,  0.641929],
   [ 1.      ,  0.75    ,  0.693785,  0.641929],
   [ 1.      ,  0.25    ,  0.693785,  0.641929],
   [ 1.      ,  0.5     ,  0.      ,  0.696937],
   [ 1.      ,  0.      ,  0.      ,  0.696937],
   [ 1.      ,  0.5     ,  0.5     ,  0.728851],
   [ 1.      ,  0.      ,  0.5     ,  0.728851],
   [ 0.      ,  0.617635,  0.5     ,  0.238728],
   [ 0.      ,  0.117635,  0.5     ,  0.238728],
   [ 0.      ,  0.5     ,  0.114216,  0.270642],
   [ 0.      ,  0.      ,  0.114216,  0.270642],
   [ 0.      ,  0.5     ,  0.      ,  0.270642],
   [ 0.      ,  0.      ,  0.      ,  0.270642],
   [ 0.      ,  0.117635,  0.5     ,  0.761272],
   [ 0.      ,  0.617635,  0.5     ,  0.761272],
   [ 0.      ,  0.      ,  0.114216,  0.729358],
   [ 0.      ,  0.5     ,  0.114216,  0.729358],
   [ 0.      ,  0.      ,  0.      ,  0.729358],
   [ 0.      ,  0.5     ,  0.      ,  0.729358],
   [ 0.      ,  0.75    ,  0.306215,  0.401299],
   [ 0.      ,  0.25    ,  0.306215,  0.401299],
   [ 0.      ,  0.75    ,  0.693785,  0.401299],
   [ 0.      ,  0.25    ,  0.693785,  0.401299],
   [ 0.      ,  0.75    ,  0.306215,  0.598701],
   [ 0.      ,  0.25    ,  0.306215,  0.598701],
   [ 0.      ,  0.75    ,  0.693785,  0.598701],
   [ 0.      ,  0.25    ,  0.693785,  0.598701],
   [ 0.      ,  0.117635,  0.5     ,  0.226923],
   [ 0.      ,  0.117635,  0.5     ,  0.773077],
   [ 0.      ,  0.      ,  0.114216,  0.260279],
   [ 0.      ,  0.      ,  0.114216,  0.739721],
   [ 0.      ,  0.      ,  0.885784,  0.260279],
   [ 0.      ,  0.      ,  0.885784,  0.739721],
   [ 0.      ,  0.75    ,  0.306215,  0.401299],
   [ 0.      ,  0.75    ,  0.306215,  0.598701],
   [ 0.      ,  0.25    ,  0.306215,  0.401299],
   [ 0.      ,  0.25    ,  0.306215,  0.598701],
   [ 0.      ,  0.75    ,  0.693785,  0.401299],
   [ 0.      ,  0.75    ,  0.693785,  0.598701],
   [ 0.      ,  0.25    ,  0.693785,  0.401299],
   [ 0.      ,  0.25    ,  0.693785,  0.598701]])
Stephen
  • 183
  • 7

2 Answers2

2

Here's an approach considering numbers as indexing tuples to reduce each row as one scalar and then simply sorting and comparing against each other, like so -

def structs_are_equiv_dotreduc(a,b):
    scale = 10000**np.arange(1,4)
    a0 = np.sort(a[:,1:].dot(scale).astype(int))
    b0 = np.sort(b[:,1:].dot(scale).astype(int))    
    return (a0 == b0).all()

Runtime test -

In [538]: # Setup inputs with b array just a row-permuted version of a
     ...: a = np.random.rand(100,4)
     ...: b = a[np.random.permutation(a.shape[0])]
     ...: 

In [539]: %timeit structs_are_equiv(a,b)
10000 loops, best of 3: 117 µs per loop

In [540]: %timeit structs_are_equiv_dotreduc(a,b)
10000 loops, best of 3: 42.7 µs per loop
Community
  • 1
  • 1
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • This is interesting. I should have mentioned that the first column contains integers. Would that still work? – Stephen Oct 30 '16 at 18:33
  • So, running the two functions for my example shows the following performance: %timeit structs_are_equiv_dotreduc(a,b) 10000 loops, best of 3: 46 µs per loop %timeit structs_are_equiv(a,b) 10000 loops, best of 3: 40.7 µs per loop For smaller arrays on the order of (100,4) (which is what I am primarily working with at the moment) does not give much of a speed up, all though it is an excellent approach for large arrays! – Stephen Oct 30 '16 at 19:05
  • @Steve Replaced it with an improved version. Check it out. The actual bottleneck was `np.allclose`. Now, with the scale that incorporates the precision in itself should take care of it with direct comparison for equality. – Divakar Oct 30 '16 at 20:16
  • I really like this solution, thank you for all of the time and effort! I feel like I have learned a lot! – Stephen Oct 31 '16 at 01:03
  • It seems that this approach actually only works conditionally. I have tested both versions structs_are_equiv within my code and in once case there was a discrepancy. I am still investigating why this occurs. – Stephen Oct 31 '16 at 21:05
0

npi.sort from the numpy_indexed package should be faster than your current solution; though the solution by divakar should be faster still if his assumptions indeed hold

Eelco Hoogendoorn
  • 10,459
  • 1
  • 44
  • 42
  • I need to edit my initial arrays, the first column should contain integers. I will look into this though, thanks! – Stephen Oct 30 '16 at 18:37
  • That should not be a problem; if I am not mistaken, npi.sort((int_column, float_columns)) should work just as well. And if it doesn't that should be a bug report i should look into – Eelco Hoogendoorn Oct 30 '16 at 18:41
  • Hmm not sure that it does work. Otherwise npi.sort(npi.as_struct_array(int, floats)) should work – Eelco Hoogendoorn Oct 30 '16 at 18:47
  • I installed numpy-indexed via pip, but I can't seem to find the npi.as_struct_array function. Could you please point me towards some documentation for this package if it is available? – Stephen Oct 30 '16 at 19:03
  • Try GitHub search. I think it's in a util module or something, but I can't check myself atm – Eelco Hoogendoorn Oct 30 '16 at 19:13
  • It is. It's in npi.utility. Thanks for pointing that out. This gives a performance of: %timeit npi.sort(npi.utility.as_struct_array(a[:,0], a[:,1], a[:,2], a[:,3] )) 10000 loops, best of 3: 67.6 µs per loop – Stephen Oct 30 '16 at 19:23
  • That's disappointing. Runtime is quite strongly related to the total number of bytes used for this method, so using float32 instead of float64 should help. Also, it does not seem like the first columns are truly ints; just floats with a particular value. You could simply leave out the as_struct_array then and it should have the same result, only faster. – Eelco Hoogendoorn Oct 30 '16 at 20:36