Finding multiple closest ray-segment intersections in parallel

Question

I know how to find whether a ray rooted at a 2D point of interest (POI) intersects a given 2D segment and how for a given ray find effectively the closest to the POI intersection with multiple segments.

However, I have several (not too many) rays rooted at each of many (and I mean many) POIs and many (and I mean many again) segments, and I need to find the closest to the corresponding POI intersection with segments for each of the rays.

Here's what I have in mind for this thing: treat rays as segments with one side at the corresponding POI and the other side very very far in the right direction, then run sweep line algorithm to find all intersections, then for each ray output the intersection closest to POI. This should run in O(N log N), and therefore be kinda sorta good enough.

However, this part happens to be a bottleneck for the whole system, and I'd like to do better than kinda sorta good enough. In particular, because there really are many objects to sort through I'd like to make this algorithm parallel. Unfortunately, sweep line algorithm seems to be inherently sequential, and living with that would impair overall performance of my otherwise fairly parallel system.

Therefore the question: what would you suggest for effective parallelization of the problem described above?

Also, are there any known highly parallel intersection detection algorithms capable of taking advantage of CUDA level of parallelism?

So far the only relevant parallel sweep I saw is [this one](http://www.ics.uci.edu/~goodrich/pubs/ggb-sweep-j.pdf) by Goodrich, Ghouse, and Bright. I'd like to find something highly parallel, suitable for CUDA multithreading. — Michael, Dec 21 '15 at 17:28
_I'd like to make this algorithm multi-threaded_...Multi-threading may not be the path that would best serve you. Do you have the ability on your system to do ***[parallel processing](http://stackoverflow.com/questions/19324306/running-two-threads-at-the-same-time)***? — ryyker, Dec 21 '15 at 17:29
Have you reviewed this ***[handout](http://www.cs.tufts.edu/comp/163/notes05/seg_intersection_handout.pdf)*** from the link you provided above? It appears to layout how you could easily split segments of computation into discrete processes run on separate cores, or processors. — ryyker, Dec 21 '15 at 17:49
@ryyker: yes, I meant parallel processing. More specifically, I'd like to run sweep on [CUDA](http://www.nvidia.com/object/cuda_home_new.html). — Michael, Dec 21 '15 at 19:16
@ryyker: per your advice just reviewed the sweep line algorithm, and I don't see an obvious way to parallelize it beyond 2 cores. The balance tree status on any given vertical line depends on the status of previous vertical line. For 2 cores one can do simultaneous sweep from left to right and from right to left and output approx 1/2 of the intersections from each side. But how does one parallelize it to 1024 cores, which is a typical level of parallelization on CUDA? — Michael, Dec 21 '15 at 19:57
This paper describes the state of the art in 2009: https://mediatech.aalto.fi/~samuli/publications/aila2009hpg_paper.pdf. The approach described solves the 3D case of ray intersection with triangles, but all the concepts should apply to 2D and line segments. — Jared Hoberock, Dec 23 '15 at 06:06
A good way to parallel-ize would be to calculate intersection and the distance from intersection per ray per POI on one core. You can find the intersection using some algorithm like the one specified in this post: [how do you detect where two line segments intersect](http://stackoverflow.com/questions/563198/how-do-you-detect-where-two-line-segments-intersect) — Vikhram, Dec 23 '15 at 16:32

Jeremy Salwen · Answer 1 · 2015-12-29T08:33:32.417

You could try to do an algorithm based on a quadtree or some other spatially indexed 2d data structure.

Here is the idea:

In each leaf of the quadtree, you store a reference to any segment which intersects that square. To start, you have zero resolution, i.e. all of the segments are in the bounding box node, which is not subdivided at all.

Then, for each ray, look at the origin of the ray and subdivide the node which contains it until the node intersects only one segment. If after a subdivision the node does not intersect any segments, move on to the next node by following the ray. If the node intersects a segment inside the node, we have found the first intersection, and can stop the search along this ray. If the ray does not intersect the segment inside the node, we can prune that segment from the search, and move on to the next node that the ray intersects.

This algorithm can obviously be run in parallel, by performing the search along each ray in parallel. There will still be the shared quadtree, but as it becomes subdivided, very quickly the threads will not compete for access.

Running Time:

Running time is a bit hard to analyze, because it depends on the 2d structure of the segments and rays. Perhaps someone can help do a better job at analyzing the running time, but here's my best shot:

If there is a small distance d, which is smaller than the distance of any ray from the closest non-intersecting segment, and smaller than the distance between any two intersection points on the same ray, then we can get an upper bound. Then we can call r=D/d the "resolution" of the problem, where D is the dimension of the bounding box.

The running time will be bounded by N * r But this is an extremely rough bound, because it basically assumes that every ray has to be resolved to resolution d along its entire length.

Let's try to refine this bound.

For any ray, consider the section of the ray between its origin and its first intersection. For any point along that section of the ray, we can ask which non-intersecting segment is the closest to that point. We can then ask how many distinct non-intersecting segments are the closest to some point along this section of the ray. In other words, if we made a Voronoi diagram of the non-intersecting segments, we are asking how many cells would this section of the ray intersect. If we let p be the maximum answer to this question for any ray, then we can get a better upper bound

 N*p*log(r)

by considering the maximum required resolution while searching along the ray.

So if p and r are constant, we have a linear time algorithm!

Unfortunately, I'm pretty sure that p, N and r have some relationship that prevents p and r from remaining roughly constant. As a rough guess, I'd say that r ~ sqrt(N), meaning it's back to N log N running time.

Parallelization is interesting. If we assume C cores, we have the search running time of N/C*p*log(r), but there is still the bottleneck of dividing the subtree. There are log(r) levels, and each segment has to be divided to this level at at most N locations, so it will take N/C*log(r) time, meaning the total running time will be

(N/C)*p*log(r)

Which basically means it's perfectly parallelizable assuming C < N.

Conclusion It seems to me that as a single-core algorithm this would probably not be so good, but on problems with special structure, and/or with parallelization, it may work well. I wouldn't recommend trying to implement this algorithm unless you've thought about it a lot, and you're sure that the rough edges can be smoothed out, and I didn't make a mistake with my analysis. I also suggest searching for published algorithms based on similar concepts.

Jeremy Salwen · Answer 2 · 2015-12-29T08:59:05.647

If mathematical exaction is not required, you could do an algorithm based on approximating the rays with versions that have the same starting point, but a slope which has been snapped to the closest discrete value in a set.

For example, consider rounding off the angle of the rays to the closest degree. There will be 360 possible values of the angle. If we consider 180 different lines, each rotated by one degree, then every ray will be parallel to at least one of the lines. If we let our set of lines be size l, we will have resolution pi/(2l) radians in approximating the angles of the rays.

We note that it should be easy to find the the first intersection of a ray if it is axis-aligned. So we perform a computation in l different bases. Every ray will be axis-aligned in one of those bases, so we only need to figure out the algorithm for finding the first intersection of a set of x-axis aligned rays.

To do so, we build a segment tree for the y axis, and then do a lookup in the segment tree to find the set of segments which intersect the x-axis aligned rays. However, since we do not care about the entire set of intersecting segments, but only first one along the ray, we do not need to keep track of the entire set of intersections as we walk down the tree, but just the best one we have found so far. This will be running time log(n)+k where k is the number of intersections. But the factor of k is parallelizable down to log(n).

So each basis requires a segment tree construction n*log(n) and each ray requires a lookup in a segment tree (log(n)+k)

Thus the total running time is

l*n*log(n)+n*(log(n)+k)

but it is all extremely parallelizable for c

(n/c)*(l*log(n)+k)

Which is pretty good, but k could potentially scale with n, meaning we still have linear running time even with ~n cores (and down to log(n) with more than n cores, because the factor of k is further parallelizable). Linear running time is better than line-sweep, but I still feel like we could do better, and I can't figure out how. Interestingly, our algorithm can find the set of segments that intersect with the line extending from the ray in (n*l/c)*(log(n)) time, but the part that is slow is to go through every intersecting segment and check whether it is on the right side of the origin point, and check whether it's the first. It seems like you couldn't get rid of this term even in a parallel version of the line sweep algorithm.

score 0 · Answer 3 · answered Dec 29 '15 at 09:53

You could try a parallel version of the line sweep algorithm by just dividing up the space and doing several line sweeps in parallel across each section.

If you divide up the space into s sections, and do a line sweep over each section, you can get some speedup. However, it is not embarrassingly parallel, because you need to perform the initialization several times for each of the intermediate starting points.

Each initialization will require a n*log(n) sort of every segment crossing that line position, and an n*log(n) construction of the event queue. However, both are embarrassingly parallel up to n cores. The sweeps then provide an embarrassingly parallel speedup of up to s.

So the running time of finding the intersections with c cores will be

(s*n/c)*log(n) +((n+k)/s)*log(n)

where k is the total number of intersections. If we choose s=sqrt(c) then we get

((n+k)/sqrt(c))*log(n)

In other words, we get a speedup that scales with the square root of the number of cores.

But since k is the total number of intersections, it could scale like n^2 which would blow things up. Even with c=n, we could have worst case running time of

n^1.5 * log(n)

There is also the optimal O(n log n + k) algorithm to find intersections(where again k is the total number of intersections) which is faster than the line sweep algorithm, but considerably more complex, so I don't know how it works, or if it could be parallelized.

Finding multiple closest ray-segment intersections in parallel

3 Answers3