3

I am exploring the fastest way to iterate through three sorted lists to find the position of the first item which is equal to or less than a double value. The lists contains two columns of doubles.

I have the two following working examples attached below, these are encompassed by a bigger while loop (which also modifies the currentPressure list changing the [0] value) value. But, considering the amount of rows (500,000+) being parsed by the bigger while loop, the code below is too slow (one iteration of the three while loops takes >20 ms).

"allPressures" contains all rows while currentPressure is modified by the remaining code. The while loops are used to align the time from the Flow, Rpm and Position lists to the Time in the pressure list.

In other words I am trying to find the quickest way to determine the x of for instance

FlowList[x].Time =< currentPressure[0].Time

Any suggestions are greatly appreciated!

Examples:

           for (int i = 0; i < allPressures.Count; i++)
            {
                if (FlowList[i].Time >= currentPressure[0].Time)
                {
                    fl = i;
                    break;
                }
            }

            for (int i = 0; i < allPressures.Count; i++)
            {
                if (RpmList[i].Time >= currentPressure[0].Time)
                {
                    rp = i;
                    break;
                }
            }

            for (int i = 0; i < allPressures.Count; i++)
            {
                if (PositionList[i].Time >= currentPressure[0].Time)
                {
                    bp = i;
                    break;
                }
            }

Using while loop:

            while (FlowList[fl].Time < currentPressure[0].Time)
            {
                fl++;
            }

            while (RpmList[rp].Time < currentPressure[0].Time)
            {
                rp++;
            }

            while (PositionList[bp].Time < currentPressure[0].Time)
            {
                bp++;
            }  
Henrik_er
  • 87
  • 1
  • 9

1 Answers1

1

The problem is that your are doing a linear search. This means that in the worst case scenario your are iterating over all the elements in your lists. This gives you a computational complexity of O(3*n) where n is the length of your lists and 3 is the number of lists you are searching.

Since your lists are sorted you can use the much faster binary search which has a complexity of O(log(n)) and in your case O(3*log(n)).

Luckily you don't have to implement it yourself, because .NET offers the helper method List.BinarySearch(). You will need the one that takes a custom comparer, because you want to compare PressureData objects.

Since you are looking for the index of the closest value that's less than your search value, you'll have to use a little trick: when BinarySearch() doesn't find a matching value it returns the index of the next element that is larger than the search value. From this it's easy to find the previous element that is smaller than the search value.

Here is an extension method the implements this:

public static int FindMaxIndex<T>(
    this List<T> sortedList, T inclusiveUpperBound, IComparer<T> comparer = null)
{
    var index = sortedList.BinarySearch(inclusiveUpperBound, comparer);

    // The max value was found in the list. Just return its index.
    if (index >= 0)
        return index;

    // The max value was not found and "~index" is the index of the
    // next value greater than the search value.
    index = ~index;

    // There are values in the list less than the search value.
    // Return the index of the closest one.
    if (index > 0)
        return index - 1;

    // All values in the list are greater than the search value.
    return -1;
}

Test it at https://dotnetfiddle.net/kLZsM5

Use this method with a comparer that understands PressureData objects:

var pdc = Comparer<PressureData>.Create((x, y) => x.Time.CompareTo(y.Time));
var fl = FlowList.FindMaxIndex(currentPressure[0], pdc);

Here is a working example: https://dotnetfiddle.net/Dmgzsv

Good Night Nerd Pride
  • 8,245
  • 4
  • 49
  • 65
  • Allocating three arrays of 500 000 doubles in a loop is almost certainly much slower than even the naive linear search currently used. Use `IComparer` or `IComparer` from a `Func` property selector or three separate comparers if the lists are already arrays. If all else fails, write out the binary search loop manually. Anything but `.Select(...).ToArray()` in a loop. – Jeroen Mostert Aug 18 '17 at 16:20
  • Currently testing the time spent on the various methods suggested. Would this method possibly be the quickest if I convert the FlowList and so forth to arrays before the loop, and thus avoid the .Select(.....).ToArray() ? – Henrik_er Aug 18 '17 at 16:29
  • @Henrik_er: if you don't modify the lists, then yes. If you do modify the lists, then converting back and forth between `List` and `Array` is hopelessly inefficient. Besides, the whole thing is a bit unnecessary because `List.BinarySearch` exists, it just has no parameter for the value, meaning you have to produce a custom comparer (`Comparer.Create`) that is slightly more difficult and incorporates the value to search for. – Jeroen Mostert Aug 18 '17 at 16:35
  • @JeroenMostert, right the projection into an array should of course be outside the main loop or best be avoided we a more sophisticated comparer. I updated my answer. – Good Night Nerd Pride Aug 18 '17 at 17:01
  • @Henrik_er, there might be a problem with doing a simple binary search. Please see the addendum to my answer and clarify whether this this is the case. – Good Night Nerd Pride Aug 18 '17 at 17:13
  • @JeroenMostert, as suspected the distances are virtually equidistant so the answer should work. Will test it in a little while. – Henrik_er Aug 18 '17 at 17:18
  • @Henrik_er, erm... I made a major mistake when writing the comparer. The last addendum should show the correct now. I hope... Also it compares `PressureData` objects directly now. – Good Night Nerd Pride Aug 18 '17 at 18:28
  • @GoodNightNerdPride it seems to be working well when the interval is constant. In one of my lists on the other hand, the interval change like this 5 - 8 - 5, in other words there is a jump of 8 seconds before returning to 5. On this occurrence, the comparer breaks down and return a negative number... The comparer in question returned ...26,27,-29 and it breaks. – Henrik_er Aug 19 '17 at 10:39
  • @Henrik_er, turns out is all much easier than expected. `BinarySearch()` does almost exactly what you need. I deleted most of the rubbish I wrote so far and updated the answer with a more sane solution. This will also solve your problem with the nonuniform intervals. – Good Night Nerd Pride Aug 19 '17 at 13:15
  • @GoodNightNerdPride The last revision works perfectly. Thanks for helping me getting on top of BinarySearch() and list extensions! – Henrik_er Aug 19 '17 at 14:05