4

I have a collection of paired numbers and need to efficiently find the set of pairs which encompass a given value.

Given the following representation of a number pair

public class Line
{
    public double Start { get; set; } //is always < end
    public double End { get; set; }
}

The collection of Lines could be visually laid out like the below (black lines) enter image description here

The perpendicular red line is the intersection criteria (just a simple number like 10.123)

I'm looking for an efficient algorithm that returns only the black lines that intersect with the red, based on the assumption that the frequency of search executions is greater than the frequency of Line additions to the collection. (and obviously assume the collection is large)

My less than optimal solution so far is to

  1. Insert Lines as they are created into two sorted lists. One sorted on Start, the other Sorted on End
  2. Binary Search on the start ordered list to find the index of the first line where the start is greater than the intersection criteria. (in theory all lines including and after this index are non intersecting)
  3. Repeat similar logic in (2) for the end ordered list
  4. Compare indexes and pick the list with the lowest remaining iterations to resolve
  5. Iterate through the remainder of that chosen list manually looking for an intersection
OrdinaryOrange
  • 2,420
  • 1
  • 16
  • 25
  • Would `lines.Where(l => l.Start <= criteria && l.End >= criteria)` be too inefficient? Or am I oversimplifying it? – Ian H. Nov 18 '17 at 23:37
  • That requires hitting every item in the collection. – OrdinaryOrange Nov 18 '17 at 23:39
  • depending on how many ranges there are, dividing the ranges into range buckets (one range can be in more than one range bucket) may result in less comparisons than binary search, and probably easier to parallelize – Slai Nov 18 '17 at 23:54
  • 5
    Maybe you want an [Interval tree](https://en.wikipedia.org/wiki/Interval_tree)? – dbc Nov 19 '17 at 00:04
  • 1
    @dbc Ahh Interval Tree describes my exact problem perfectly, Thankyou. I'll find some implementations and run some perf tests. It is sure to be faster than my rolled solution. Post as an answer and I'll accept. – OrdinaryOrange Nov 19 '17 at 01:05

2 Answers2

2

Your Line class represents an interval in . You have a large number of such intervals, and would like to find those that overlap a certain point in better than linear time.

One possible solution for your requirements is the interval tree:

  • Queries require O(log n + m) time, Where n is the total number of intervals and m is the number of overlaps with the query point found.
  • Construction requires O(n log n).
  • Storage requires O(n) space.

Example implementation (which I have not tested) at Codeplex. For others see C# Interval tree class.

For a comparison with a related structure, the segment tree, see What are the differences between segment trees, interval trees, binary indexed trees and range trees?.

dbc
  • 104,963
  • 20
  • 228
  • 340
0

This should be probably done with two parallel searchers, but...
since you already manage two lists, try this:

Search the first .Start value that is > ValuePoint.
(All values beyond this Index are clearly invalid)

int idx_start = Lines.FindIndex(x => x.Start > ValuePoint);

Sort the list to this point by the values of [.End]
Given that your list is already sorted, this Sort can never be a O(n) operation,
should average to O((i) log (i)), where i = Index of the first value found.
Of course, i may be = n

EndComparer ecomp = new EndComparer();
Lines.Sort(0, idx_start, eComp);

where EndComparer:

public class EndComparer : IComparer<Lines>
{
    public int Compare(Lines lineX, Lines lineY)
    {
        return lineX.End.CompareTo(lineY.End);
    }
}

Then find the first .End value that is > ValuePoint

int idx_end = Lines.FindIndex(0, idx_start, x => x.End > ValuePoint);

if (idx_end > -1)
{
    //All values in the range [idx_end; idx_start] are valid
    //If idx_end = 0 then all pre-selected values are valid [0; idx_start]
} else {
    //All value in the range are non valid
}
Jimi
  • 29,621
  • 8
  • 43
  • 61