1

I have a List<> of one-dimensional segments, each segment is represented by a struct containing start and end (x1, x2). Segments can overlap. The list is sorted by starts (x1).

How can I efficiently find the shortest segment containing a particular point x?

Currently, I use list.BinarySearch to find the first segment for which x >= x1, which is efficient. Then I move on though the list one segment at a time, until x < x1, to find the shortest segment for which x1 <= x < x2, which is not so efficient.

How can this algorithm be possibly improved for better speed? I feel it can be, but I'm stuck.

This question is not a duplicate of Shortest distance between a point and a line segment.

Community
  • 1
  • 1
avo
  • 10,101
  • 13
  • 53
  • 81
  • For the data structure you are given, your algorithm is good. Possible improvements will depend on multiple factors, such as: how willing are you to change the data structure, how many segments on average share the same start point, etc. – mbeckish Apr 09 '14 at 14:44
  • @mbeckish, I'm willing to change the data structure in any way, e.g. I can make it a tree. Sharing the same starting point and overlapping is unlikely, but possible. Thanks for your thoughts. – avo Apr 09 '14 at 14:53
  • Does it have to dynamically support insertion and removal of intervals? – Niklas B. Apr 09 '14 at 14:56
  • Also, are your queries and endpoints integers? – Niklas B. Apr 09 '14 at 14:57
  • @NiklasB., it doesn't have to dynamically support insertion/removal. It gets rebuilt as a whole every time the input data changes. The endpoints are indeed integers. – avo Apr 10 '14 at 06:36

2 Answers2

3

If you allow yourself O(n log n) preprocessing time you can build a data structure to give you the answer in O(log n) time for any query point. Sort all end points of the intervals (keeping track of the interval they correspond to), and process the end points from left to right. Also maintain a self balancing binary search tree of intervals ordered by length of the intervals. Every time you encounter a left end point, add the interval to the tree, and every time you encounter a right end point, remove the interval from the tree. Furthermore, for each end point you encounter, record the shortest interval (leftmost node in the tree) that contains the subinterval in between the end point and the previous end point. (If there are no intervals, record "None") So in the end you will have an array of 2n - 1 subintervals in sorted over that know whether there is an interval that covers the subinterval, and if so, what the shortest interval is. Then given a point you can binary search into the structure to find the subinterval that contains the point, and report whether there is an interval that contains the point and if so, what the shortest interval is.

user2566092
  • 4,631
  • 15
  • 20
  • Yep, exactly what I was going to suggest. If there is no heap that supports deletion in the standard library, one can use a binary search tree instead – Niklas B. Apr 09 '14 at 15:05
  • Good point, I'll switch it to a self-balancing binary search tree. – user2566092 Apr 09 '14 at 15:33
1

Have a look at the interval trees (http://en.wikipedia.org/wiki/Interval_tree) and segment trees (http://en.wikipedia.org/wiki/Segment_tree).

These support efficient queries to report all intervals straddling a given point.

I don't know of a solution to efficiently report just the smallest interval.

  • Just use binary search tree augmentation (smallest interval) and get a subtree that represents the intervals that intersect the given point – Niklas B. Apr 09 '14 at 14:55