Background
I have two numpy arrays which I'd like to use to carry out some comparison operations in the most efficient/fast way possible. Both contain only unsigned ints.
pairs
is a n x 2 x 3
array, which holds a long list of paired 3D coordinates (for some nomenclature, the pairs
array contains a set of pairs...) - i.e.
# full pairs array
In [145]: pairs
Out[145]:
array([[[1, 2, 4],
[3, 4, 4]],
.....
[[1, 2, 5],
[5, 6, 5]]])
# each entry contains a pair of 3D coordinates
In [149]: pairs[0]
Out[149]:
array([[1, 2, 4],
[3, 4, 4]])
positions
is an n x 3
array which holds a set of 3D coordinates
In [162]: positions
Out[162]:
array([[ 1, 2, 4],
[ 3, 4, 5],
[ 5, 6, 3],
[ 3, 5, 6],
[ 6, 7, 5],
[12, 2, 5]])
Goal
I want to create an array which is a subset of the pairs
array, but contains ONLY entries where at most one of the pairs is in the positions array - i.e. there should be no pairs where BOTH pairs lie in the positions array. For some domain information, every pair will have at least one of the pair positions inside the positions list.
Approaches tried so far
My initial naive approach was to loop over each pair in the pairs
array, and subtract each of the two pair positions from the positions
vector, determining if in BOTH cases we found a match indicated by the presence of a 0 in both the vectors which come from the subtraction operations:
if (~(positions-pair[0]).any(axis=1)).any() and
(~(positions-pair[1]).any(axis=1)).any():
# both members of the pair were in the positions array -
# these weren't the droids we were looking for
pass
else:
# append this set of pairs to a new matrix
This works fine, and takes advantage of some vectorization, but there is probably a better way to do this?
For some other performance-sensitive portions of this program I've re-written things with Cython which has brought a massive speedup, though in this case (at least based on a naive nested for-loop implementation) this was slightly slower than the approach outlined above.
If people have suggestions I'm happy to profile and report back (I have all the profiling infrastructure set up).