2

I have a large number of (x,y) grid points with integer coordinates which i want to test if they are in small number of circles given by radius and center. The points are some marked parts of an image, which means there are a small number of irregular shaped blocks, which contain the points. There i want to check for collisions and count the number of points inside a circle. My current approaches are rather slow (with python and numpy).

Now i have two tasks:

  1. Test, if any point of set A are in any circle
  2. Count the number of points of set B, which are in a circle

My current implementation looks like this (setA and setB are Nx2 numpy arrays and center is a 1x2 array.):

1) For each circle create an array of point - center, square it elementwise and take the sum, then check if it's smaller than radius**2

for circle in circles:
    if (((setA - circle.center)**2).sum(axis=1) < circle.radius**2).any():
        return "collision"
return "no collision"

This could be optimized by using a python loop and breaking on the first collision, but usually numpy loops are a lot faster than python loops and actually both versions were slower than expected.

2) For each circle create an array of distances and do an elementwise less than radius test. Add up all arrays and count the non-zero elements of the result.

pixels = sp.zeros(len(setB))
for circle in circles:
    pixels += (((setB - circle.center)**2).sum(axis=1) < circle.radius**2)
return np.count_nonzero(pixels)

Is there an easy option to speedup this?

I do not want to over optimize (and make the program a lot more complicated), but just to use numpy in the most efficient way, using the numpy vectorization as much as possible.

So building the most perfect spatial tree or similiar isn't my goal, but i think a O(n^2) algorithm for a few thousand points and 10-20 circles should be possible in as fast way on an average desktop computer today.

allo
  • 3,955
  • 8
  • 40
  • 71

1 Answers1

3

Taking advantage of coordinates being integers:

create a lookup image

radius = max([circle.radius for circle in circles])
mask = np.zeros((image.shape[0] + 2*radius, image.shape[1] + 2*radius), dtype=int)
for circle in circles:
    center = circle.center + radius
    mask[center[0]-circle.radius:center[0]+circle.radius + 1,
         center[1]-circle.radius:center[1]+circle.radius + 1] += circle.mask

circle.mask is a small square patch containing a mask of the disc of interior points

counting collisions is now as easy as

mask[radius:-radius, radius:-radius][setB[:,0], setB[:,1]].sum()

fast creation of discs (no multiplications, no square roots):

r = circle.radius
h2 = np.r_[0, np.add.accumulate(np.arange(1, 2*r+1, 2))]
w = np.searchsorted(h2[-1] - h2[::-1], h2)
q = np.zeros(((r+1), (r+1)), dtype=int)
q[np.arange(r+1), w[::-1]] = 1
q[1:, 0] -= 1
q = np.add.accumulate(q.ravel()).reshape(r+1, r+1)
h = np.c_[q, q[:, -2::-1]]
circle.mask = np.r_[h, h[-2::-1]]
Paul Panzer
  • 51,835
  • 3
  • 54
  • 99
  • I do not see, why this should be faster than my current solution. I think it uses similiar numpy operations and has an advantage for when the circles are much smaller than the number of pixels. But the rest is together with the mask creation very similiar. – allo Feb 19 '17 at 20:18
  • 1
    @allo Yes, you are right, it depends on numbers. Yours is O(#circles x #setB) mine is O(#circles + #setB) where I have taken the liberty of assuming constant max radius and image size. (If the circles are fixed, you can precompute the mask. Lookup I'd argue is as fast as it gets.) – Paul Panzer Feb 19 '17 at 20:32
  • Yes, i think lookup for new points or new circles with the same radius cannot get faster than with your method. I will look into it, but i have fixed points and circles with different radii. It may help, as #circles << #points, but probably often #points(circle) is bigger than #points. – allo Feb 19 '17 at 20:43
  • @allo I've written a fast (I think) disc generator. Have a look if you like. – Paul Panzer Feb 19 '17 at 21:21
  • Thank you. I think the approach will have big advantages when testing against a larger number of discs, too. – allo Feb 19 '17 at 21:29
  • It speeds up the counting a lot, but the intersection test only a bit. Possibly the ``.any()`` already caused an early ``break`` in the vectorized numpy routine anyway. The collisions seem not to be exactly the same, i guess diagonal distances make a difference. But this are only minimal differences. – allo Feb 20 '17 at 22:18
  • @allo yes, short-circuiting, or rather the lack thereof, is numpy's one great weakness. If the expected collision rate is high you could try and chop up setA into chunks and test them sequentially. You'd choose the chunk size such that the hit probability per chunk is still at least 50% (I don't know what the oprimal percentage is, but 50% shouldn't be too bad) – Paul Panzer Feb 20 '17 at 23:03