0

I have some sorted data which has the starting and ending integers of each range, and they are contiguous. So my data might look like this:

Number Start End 
0      0     47 
1      48    94 
2      95    287 
3      288   1123

and so on.

I will get an integer like 113 and I want the fastest way to search the data to find the matching number. I can afford to stick the data into some structure that optimizes the retrieval / comparison speed.

My data is very large.

EDIT: I chose an answer, and this is the code I ended up with:

  Public Function EndingCaptureNumber(CaptureEnd As Integer) As Integer
    EndingCaptureNumber = CaptureEnds.BinarySearch(CaptureEnd)
    If EndingCaptureNumber < 0 Then
      Return (Not EndingCaptureNumber) - 1
    End If
  End Function

Capture ends is a list of the end of each range. Not is the bitwise compliment. Since this finds the first that was greater, I subtract 1 to get the last that was not greater.

Edit: Duplicate Question Rebuttal

The answer that came out of this uses the built in BinarySearch but handles values that are not an exact match. Seekers reading that other article would not learn this (imho) better answer. Plus, the other question is cluttered by the actual data types the OP used in his RL problem.

toddmo
  • 20,682
  • 14
  • 97
  • 107
  • 2
    Binary search? Search `Start` that is less or equal to your number that has the corresponding greater or equal `End`. The structure you need to keep it at must only provide the random access in constant time, then you're fine. – zerkms Sep 10 '14 at 21:06
  • What is the *range* of the State/End numbers? How many rows/entries are there? How many times will the lookup be performed? Does the data change? – user2864740 Sep 10 '14 at 21:10
  • 1
    Is this a sql question? If your _data_ is large it should be stored in a database. – Tim Schmelter Sep 10 '14 at 21:10
  • @user2864740, there is no limit, it could be very big, but below integer.maxvalue. The data does not change. The lookup will be performed 1000s of times, with a different argument each time. – toddmo Sep 10 '14 at 21:12
  • @PhilipPittle, I have tried a dumb loop. I can see how wasteful that is. I'm trying to improve the performance. I'm looking for built in structures, methods if possible. – toddmo Sep 10 '14 at 21:14
  • @zerkms, binary search sounds very efficient. What .net storage can I make that can take advantage of that w/o writing my own binary search algo? – toddmo Sep 10 '14 at 21:16

3 Answers3

2

Since you only need the lowerbound you could store it in a SortedList<int, int> where the key is the lower-bound and the value is the number, like this:

static SortedList<int, int> Ranges = new SortedList<int, int>
{
    {0, 0}, {48, 1}, {95, 2}, {288, 3} 
};

Now you can use this extension method to find the index of the next higher number:

private static int BinarySearch<T>(IList<T> list, T value)
{
    if (list == null)
        throw new ArgumentNullException("list");
    var comp = Comparer<T>.Default;
    int lo = 0, hi = list.Count - 1;
    while (lo < hi)
    {
        int m = (hi + lo) / 2;  // this might overflow; be careful.
        if (comp.Compare(list[m], value) < 0) lo = m + 1;
        else hi = m - 1;
    }
    if (comp.Compare(list[lo], value) < 0) lo++;
    return lo;
}

public static int FindFirstIndexGreaterThanOrEqualTo<T, U>(this SortedList<T, U> sortedList, T key)
{
    return BinarySearch(sortedList.Keys, key);
}

( Credits to this answer: https://stackoverflow.com/a/594528/284240 )

... and following code to get the number:

int number;
int find = 113;
int pos = Ranges.FindFirstIndexGreaterThanOrEqualTo(find);
if (pos > 0)
{
    int key = Ranges[pos];
    if(key == find) 
        number = Ranges[Ranges.Keys[pos]]; // matches lowerbound
    else
        number = Ranges[Ranges.Keys[pos - 1]]; // in range        
}
Community
  • 1
  • 1
Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939
0

I think this is the fastest way. However, it has to be setup and uses a lot of memory.

   int[] Numbers = new int[100000];

   for (int i = 0; i <= 47; i++)
       Numbers[i] = 0;
   for (int i = 48; i <= 94; i++)
       Numbers[i] = 1;
   for (int i = 95; i <= 287; i++)
       Numbers[i] = 2;
   for (int i = 288; i <= 1123; i++)
       Numbers[i] = 3;

   int Result = Numbers[113];   // Fast!
Steve Wellens
  • 20,506
  • 2
  • 28
  • 69
0

Stick with a binary search. If you have N ranges and look for K numbers, searching would take O(KlogN).

Using @Steve Wellens's suggestion of spreading everything will take a lot of setup - O(R) (with R being the last range ending - 1123 in your example). After the setup, K searches would take O(K), so you're looking at O(K+R)

Now, if the maximum number is less than KlogN, and memory is not a problem, spread out the ranges. If not (which is my guess, you said you had a lot of data), a binary search will be faster.

zmbq
  • 38,013
  • 14
  • 101
  • 171