5

Is it possible to efficiently count the number of line segments that overlap a single point P on a number line?

All the line segments are sitting on a single number line (its a 1-D world, not a 3-D world).

Each line segment has a start coordinate X1 and an end coordinate X2.

Example:

Line segment A spans from X1==1 to X2==3
Line segment B spans from X1==2 to X2==4
Line segment C spans from X1==3 to X2==5
Line segment D spans from X1==1 to X2==4
----------------------------------------
Ex1: Line segments that overlap point P==2: A,B and D   >>> overlap count==3.
Ex2: Line segments that overlap point P==7: None        >>> overlap count==0.
Ex3: Line segments that overlap point P==3: A,B,C and D >>> overlap count==4.

Of course, if there is only 4 line segments, then the code is simple. However, if there is a huge spatial database of 400 million line segments, then the search is very slow.

Are there any algorithms that can efficiently search this list of line segments for the total number of overlaps?

What I am looking at so far

Mathfan
  • 59
  • 4
  • 1
    Do you need to generate the list of overlaps just once, or do you expect to compute this for a random point? – Floris Jun 22 '13 at 19:40
  • @Floris I have to compute it for lots of random points, very quickly. I might want to feed 100,000 points in, and generate the number of overlapping line segments for each point. – Mathfan Jun 22 '13 at 19:42
  • 1
    @Mathfan Just ( natural ) integers of floats? –  Jun 22 '13 at 19:45
  • @Armin At the moment, I only need to support natural integers. – Mathfan Jun 22 '13 at 19:51
  • Are you allowed to sort the query points and to sort the intervals ? How is the distribution of the number of intervals per query point ? –  Feb 26 '16 at 14:56
  • Is all this data currently in bulk ? By curiosity, what is the context ? –  Feb 26 '16 at 15:20

4 Answers4

3

If you sort the list by starting value, and then again (for the same starting value) by length, you end up with the roots of an efficient algorithm.

sort the list by starting value
for the same starting value, sort by length (longest first)

Then, when you need the number of line segments that overlap a given point P:

for a given value p
find the points in the list with starting value <= p (binary search - fast)
for each starting value, start with the longest length
if it spans the point of interest, increment counter
if not, go to the next smaller start value
keep going until you have reached the smallest starting value

It's not perfect, but a lot better than searching through 10M points (although the initial sort will take some time, obviously. But you only need to do it once).

Floris
  • 45,857
  • 6
  • 70
  • 122
  • If I've calculate runtime runtime correctly this algorithm has m * (n/2 + nlog(n) ) where m is number of query points, if i'm right then it's slow, please correct me if i'm wrong. – dmgcodevil Mar 20 '16 at 21:52
  • `m*n log(n)` is faster than `n^2 log (n)` assuming `m – Floris Mar 20 '16 at 22:24
  • Maybe I missed something, but: 1. nlog(n) once in the beginning 2. binary search for each query point O(log n) 3. scan array of segments starting from position received after binary search until p – dmgcodevil Mar 20 '16 at 23:51
1

Sort the query points and the intervals endpoints increasingly, in a single array; for every point, keep a flag telling if it is a interval start, an interval end or a query.

Initialize a counter to zero and scan the list. A start increases the counter; an end decreases it; a query knows the number of overlapping intervals by reading the counter.

Time (N+M).Log(N+M) or better if special sorting can be used.


If you aren't allowed to sort the query points, just sort the interval endpoints. In a single linear scan, you can compute the number of overlaps right after each endpoint.

For a given query point, you find the relevant endpoint by dichotomic search, hence the overlap count.

M.Log(M)+N.Log(M) for M intervals and N query points.


If you aren't allowed to sort the intervals, just sort the query points.

Process every interval in turn, find the first query point it overlaps, by dichotomic search, and increase the counter of all query points it overlaps.

N.Log(N)+M.Log(N)+O where O is the total number of interval/query overlaps.


If you aren't allowed to sort at all, test exhaustively every query against every interval, N.M.

  • yes, if sort all points: start, end and query point then algorithm has n^2 log(n), which is not bad. thanks. – dmgcodevil Mar 20 '16 at 21:49
  • @dmgcodevil: can't understand your claim, but N²Log(N) complexity is awful. –  Mar 21 '16 at 07:04
  • no, it's not n^2 log(n), it's m+ mlog(m) where m is sum of start, end and query points: m = s+e+q. Nevertheless it's not the worst solution for this problem. – dmgcodevil Mar 27 '16 at 16:45
0

Take a look at interval trees or segment trees to help solve this sort of problem. This answer has some good examples of how these techniques could help you.

Community
  • 1
  • 1
jedierikb
  • 12,752
  • 22
  • 95
  • 166
-1

Start by realizing that you can't do better than O(N) as you need to look at each of the line segments at least once.(where N=number of line segments)

Let's have an array Line_segments_at, that stores the number of line segments passing through each point.

First we need to initialize this array to 0. Then, when we look at the i-th line segment we need to do:

for(int j=x1[i];j<=x2[i];j++)
 Line_segments_at[j]++;

For every query point k, we can simply return the result as Line_segments_at[k].

Aravind
  • 3,169
  • 3
  • 23
  • 37
  • I can be done much better than O(n), even better than log(n). Once you built a tree, it can be done in maximum 32 iterations for the whole range of `unsigned int` for any amount of lines. –  Jun 22 '13 at 23:51
  • @Armin:I din't know of it, please post that answer. – Aravind Jun 23 '13 at 04:43
  • @user1944441: It depends, if the spatial index may be created once and then reused many times or will it be used once as well. It's similar case to the sorting and performing binary search. Sure searching has complexity O(logN) but first you must sort elements or build the tree which has O(NlogN). So if you want to do it only once, a simple O(N) approach will be faster. – Adam Wulkiewicz Apr 25 '14 at 00:38