You could try to do an algorithm based on a quadtree or some other spatially indexed 2d data structure.
Here is the idea:
In each leaf of the quadtree, you store a reference to any segment which intersects that square. To start, you have zero resolution, i.e. all of the segments are in the bounding box node, which is not subdivided at all.
Then, for each ray, look at the origin of the ray and subdivide the node which contains it until the node intersects only one segment. If after a subdivision the node does not intersect any segments, move on to the next node by following the ray. If the node intersects a segment inside the node, we have found the first intersection, and can stop the search along this ray. If the ray does not intersect the segment inside the node, we can prune that segment from the search, and move on to the next node that the ray intersects.
This algorithm can obviously be run in parallel, by performing the search along each ray in parallel. There will still be the shared quadtree, but as it becomes subdivided, very quickly the threads will not compete for access.
Running Time:
Running time is a bit hard to analyze, because it depends on the 2d structure of the segments and rays. Perhaps someone can help do a better job at analyzing the running time, but here's my best shot:
If there is a small distance d, which is smaller than the distance of any ray from the closest non-intersecting segment, and smaller than the distance between any two intersection points on the same ray, then we can get an upper bound. Then we can call r=D/d the "resolution" of the problem, where D is the dimension of the bounding box.
The running time will be bounded by N * r
But this is an extremely rough bound, because it basically assumes that every ray has to be resolved to resolution d along its entire length.
Let's try to refine this bound.
For any ray, consider the section of the ray between its origin and its first intersection. For any point along that section of the ray, we can ask which non-intersecting segment is the closest to that point. We can then ask how many distinct non-intersecting segments are the closest to some point along this section of the ray. In other words, if we made a Voronoi diagram of the non-intersecting segments, we are asking how many cells would this section of the ray intersect. If we let p be the maximum answer to this question for any ray, then we can get a better upper bound
N*p*log(r)
by considering the maximum required resolution while searching along the ray.
So if p and r are constant, we have a linear time algorithm!
Unfortunately, I'm pretty sure that p, N and r have some relationship that prevents p and r from remaining roughly constant. As a rough guess, I'd say that r ~ sqrt(N)
, meaning it's back to N log N
running time.
Parallelization is interesting. If we assume C cores, we have the search running time of N/C*p*log(r)
, but there is still the bottleneck of dividing the subtree. There are log(r)
levels, and each segment has to be divided to this level at at most N locations, so it will take N/C*log(r)
time, meaning the total running time will be
(N/C)*p*log(r)
Which basically means it's perfectly parallelizable assuming C < N.
Conclusion
It seems to me that as a single-core algorithm this would probably not be so good, but on problems with special structure, and/or with parallelization, it may work well. I wouldn't recommend trying to implement this algorithm unless you've thought about it a lot, and you're sure that the rough edges can be smoothed out, and I didn't make a mistake with my analysis. I also suggest searching for published algorithms based on similar concepts.