3

I'm desperately searching for an efficient way to check if two 2D numpy Arrays intersect.

So what I have is two arrays with an arbitrary amount of 2D arrays like:

A=np.array([[2,3,4],[5,6,7],[8,9,10]])
B=np.array([[5,6,7],[1,3,4]])
C=np.array([[1,2,3],[6,6,7],[10,8,9]])

All I need is a True if there is at least one vector intersecting with another one of the other array, otherwise a false. So it should give results like this:

f(A,B)  -> True
f(A,C)  -> False

I'm kind of new to python and at first I wrote my program with Python lists, which works but of course is very inefficient. The Program takes days to finish so I am working on a numpy.array solution now, but these arrays really are not so easy to handle.

Here's Some Context about my Program and the Python List Solution:

What i'm doing is something like a self-avoiding random walk in 3 Dimensions. http://en.wikipedia.org/wiki/Self-avoiding_walk. But instead of doing a Random walk and hoping that it will reach a desirable length (e.g. i want chains build up of 1000 beads) without reaching a dead end i do the following:

I create a "flat" Chain with the desired length N:

X=[]
for i in range(0,N+1):
    X.append((i,0,0))

Now i fold this flat chain:

  1. randomly choose one of the elements ("pivotelement")
  2. randomly choose one direction ( either all elements to the left or to the right of the pivotelment)
  3. randomly choose one out of 9 possible rotations in space (3 axes * 3 possible rotations 90°,180°,270°)
  4. rotate all the elements of the chosen direction with the chosen rotation
  5. Check if the new elements of the chosen direction intersect with the other direction
  6. No intersection -> accept the new configuration, else -> keep the old chain.

Steps 1.-6. have to be done a large amount of times (e.g. for a chain of length 1000, ~5000 Times) so these steps have to be done efficiently. My List-based solution for this is the following:

def PivotFold(chain):
randPiv=random.randint(1,N)  #Chooses a random pivotelement, N is the Chainlength
Pivot=chain[randPiv]  #get that pivotelement
C=[]  #C is going to be a shifted copy of the chain
intersect=False
for j in range (0,N+1):   # Here i shift the hole chain to get the pivotelement to the origin, so i can use simple rotations around the origin
    C.append((chain[j][0]-Pivot[0],chain[j][1]-Pivot[1],chain[j][2]-Pivot[2]))
rotRand=random.randint(1,18)  # rotRand is used to choose a direction and a Rotation (2 possible direction * 9 rotations = 18 possibilitys)
#Rotations around Z-Axis
if rotRand==1:
    for j in range (randPiv,N+1):
        C[j]=(-C[j][1],C[j][0],C[j][2])
        if C[0:randPiv].__contains__(C[j])==True:
            intersect=True
            break
elif rotRand==2:
    for j in range (randPiv,N+1):
        C[j]=(C[j][1],-C[j][0],C[j][2])
        if C[0:randPiv].__contains__(C[j])==True:
            intersect=True
            break
...etc
if intersect==False: # return C if there was no intersection in C
    Shizz=C
else:
    Shizz=chain
return Shizz

The Function PivotFold(chain) will be used on the initially flat chain X a large amount of times. it's pretty naivly written so maybe you have some protips to improve this ^^ I thought that numpyarrays would be good because i can efficiently shift and rotate entire chains without looping over all the elements ...

user3785759
  • 41
  • 1
  • 1
  • 4
  • First, those are 2D. Second, what does "intersect" mean here? – user2357112 Jun 29 '14 at 15:07
  • oh jup, of course 2D :D i thought of 3D because the elements each contain 3 numbers. By intersect i mean that i want to check if there are any 2 elements, which are completely equal. So [1,2,3] and [1,2,3] are the same. But not [2,3,4] and [3,2,4] for example. Think of normal 3D Vectors ... there shouldnt be any two vectors pointing at the same spot in space – user3785759 Jun 29 '14 at 15:14
  • What context are you planning to use this in? What do these arrays represent? NumPy's efficiency is highly dependent on performing operations in bulk; how many of these intersections are you going to compute at once? – user2357112 Jun 29 '14 at 15:15
  • sorry i edited my first comment, didnt realize i shouldnt press enter here xD... i was hoping it will be efficient with arrays of length 500-1000 – user3785759 Jun 29 '14 at 15:18
  • 1
    I dunno if NumPy is what I'd go for here. You can get a pretty efficient result with regular Python data structures. For example, with sets of tuples instead of NumPy arrays, a linear-time intersection is just `not set1.isdisjoint(set2)`. An all-pairs intersection solution that finds all intersections between N arrays can be done in time roughly comparable to N individual intersections instead of N^2, as long as there aren't too many intersections. Can you show your list-based solution? – user2357112 Jun 29 '14 at 16:16
  • Not exactly the same but related: http://stackoverflow.com/q/23814517/2379410 –  Jun 29 '14 at 17:16
  • thanks for your answers, for my list-based solution it will need some context about the program. i'll post it tomorrow cya and thx :) – user3785759 Jun 29 '14 at 19:02
  • I edited some context and my list-based solution to my original post. – user3785759 Jun 30 '14 at 13:05

6 Answers6

4

This should do it:

In [11]:

def f(arrA, arrB):
    return not set(map(tuple, arrA)).isdisjoint(map(tuple, arrB))
In [12]:

f(A, B)
Out[12]:
True
In [13]:

f(A, C)
Out[13]:
False
In [14]:

f(B, C)
Out[14]:
False

To find intersection? OK, set sounds like a logical choice. But numpy.array or list are not hashable? OK, convert them to tuple. That is the idea.

A numpy way of doing involves very unreadable boardcasting:

In [34]:

(A[...,np.newaxis]==B[...,np.newaxis].T).all(1)
Out[34]:
array([[False, False],
       [ True, False],
       [False, False]], dtype=bool)
In [36]:

(A[...,np.newaxis]==B[...,np.newaxis].T).all(1).any()
Out[36]:
True

Some timeit result:

In [38]:
#Dan's method
%timeit set_comp(A,B)
10000 loops, best of 3: 34.1 µs per loop
In [39]:
#Avoiding lambda will speed things up
%timeit f(A,B)
10000 loops, best of 3: 23.8 µs per loop
In [40]:
#numpy way probably will be slow, unless the size of the array is very big (my guess)
%timeit (A[...,np.newaxis]==B[...,np.newaxis].T).all(1).any()
10000 loops, best of 3: 49.8 µs per loop

Also the numpy method will be RAM hungry, as A[...,np.newaxis]==B[...,np.newaxis].T step creates a 3D array.

CT Zhu
  • 52,648
  • 17
  • 120
  • 133
3

Using the same idea outlined here, you could do the following:

def make_1d_view(a):
    a = np.ascontiguousarray(a)
    dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))
    return a.view(dt).ravel()

def f(a, b):
    return len(np.intersect1d(make_1d_view(A), make_1d_view(b))) != 0

>>> f(A, B)
True
>>> f(A, C)
False

This doesn't work for floating point types (it will not consider +0.0 and -0.0 the same value), and np.intersect1d uses sorting, so it is has linearithmic, not linear, performance. You may be able to squeeze some performance by replicating the source of np.intersect1d in your code, and instead of checking the length of the return array, calling np.any on the boolean indexing array.

Community
  • 1
  • 1
Jaime
  • 65,696
  • 17
  • 124
  • 159
3

You can also get the job done with some np.tile and np.swapaxes business!

def intersect2d(X, Y):
        """
        Function to find intersection of two 2D arrays.
        Returns index of rows in X that are common to Y.
        """
        X = np.tile(X[:,:,None], (1, 1, Y.shape[0]) )
        Y = np.swapaxes(Y[:,:,None], 0, 2)
        Y = np.tile(Y, (X.shape[0], 1, 1))
        eq = np.all(np.equal(X, Y), axis = 1)
        eq = np.any(eq, axis = 1)
        return np.nonzero(eq)[0]

To answer the question more specifically, you'd only need to check if the returned array is empty.

Nolan Conaway
  • 2,639
  • 1
  • 26
  • 42
1

This should be much faster it is not O(n^2) like the for-loop solution, but it isn't fully numpythonic. Not sure how better to leverage numpy here

def set_comp(a, b):
   sets_a = set(map(lambda x: frozenset(tuple(x)), a))
   sets_b = set(map(lambda x: frozenset(tuple(x)), b))
   return not sets_a.isdisjoint(sets_b)
daniel
  • 2,568
  • 24
  • 32
0

i think you want true if tow arrays have subarray set ! you can use this :

def(A,B):
 for i in A:
  for j in B:
   if i==j
   return True
 return False 
Mazdak
  • 105,000
  • 18
  • 159
  • 188
  • Thanks for your answer but i am looking for a solution based on numpy functions without any loops. – user3785759 Jun 29 '14 at 15:23
  • oh yes what about this ? : numpy.array_equal(a1, a2)[source]¶ True if two arrays have the same shape and elements, False otherwise. – Mazdak Jun 29 '14 at 15:30
  • this compare is for arrays and in this case u mast use for subarrays if you dont want to use loop i havnt any idea ! – Mazdak Jun 29 '14 at 15:36
  • yes this one comes realy close to what i want. but array_equal works with 1d arrays. so i could only use it to check single elements like np.array_equal([1,2,3],[1,2,3]) gives true for example ... but i would have to loop over all the elements of my 2 arrays to check for an intersection which again is not what i want :/ – user3785759 Jun 29 '14 at 15:37
  • yes and i wrote for you you must use this in loop bu if you dont want loop for now i havnt any idea ! but i think for this case we cant dont use loop if there was an function in numpy thay do this it actually use loop because the aim of loop is this !that reduce our calculate !!! – Mazdak Jun 29 '14 at 15:40
  • @bluebird7, you'll want to use `(i == j).all()` in this case. Just FYI – daniel Jun 29 '14 at 17:45
0

This problem can be solved efficiently using the numpy_indexed package (disclaimer: I am its author):

import numpy_indexed as npi
len(npi.intersection(A, B)) > 0
Eelco Hoogendoorn
  • 10,459
  • 1
  • 44
  • 42