40

I have a bunch of pairs of dates and monetary values in a SortedDictionary<DateTime, decimal>, corresponding to loan balances calculated into the future at contract-defined compounding dates. Is there an efficient way to find a date key that is nearest to a given value? (Specifically, the nearest key less than or equal to the target). The point is to store only the data at the points when the value changed, but efficiently answer the question "what was the balance on x date?" for any date in range.

A similar question was asked ( What .NET dictionary supports a "find nearest key" operation? ) and the answer was "no" at the time, at least from the people who responded, but that was almost 3 years ago.

The question How to find point between two keys in sorted dictionary presents the obvious solution of naively iterating through all keys. I am wondering if any built-in framework function exists to take advantage of the fact that the keys are already indexed and sorted in memory -- or alternatively a built-in Framework collection class that would lend itself better to this kind of query.

Community
  • 1
  • 1
Joshua Honig
  • 12,925
  • 8
  • 53
  • 75
  • But the keys of a dictionary are NOT sorted in memory. Which is why you can't directly do a "get nearest" or "get less than or equal to" on a dictionary. – Jim Mischel Jul 23 '13 at 16:06
  • The linked question has mention about `SortedList` structure. I would go with that. – nawfal Jun 11 '14 at 13:52

4 Answers4

35

Since SortedDictionary is sorted on the key, you can create a sorted list of keys with

var keys = new List<DateTime>(dictionary.Keys);

and then efficiently perform binary search on it:

var index = keys.BinarySearch(key);

As the documentation says, if index is positive or zero then the key exists; if it is negative, then ~index is the index where key would be found at if it existed. Therefore the index of the "immediately smaller" existing key is ~index - 1. Make sure you handle correctly the edge case where key is smaller than any of the existing keys and ~index - 1 == -1.

Of course the above approach really only makes sense if keys is built up once and then queried repeatedly; since it involves iterating over the whole sequence of keys and doing a binary search on top of that there's no point in trying this if you are only going to search once. In that case even naive iteration would be better.

Update

As digEmAll correctly points out, you could also switch to SortedList<DateTime, decimal> so that the Keys collection implements IList<T> (which SortedDictionary.Keys does not). That interface provides enough functionality to perform a binary search on it manually, so you could take e.g. this code and make it an extension method on IList<T>.

You should also keep in mind that SortedList performs worse than SortedDictionary during construction if the items are not inserted in already-sorted order, although in this particular case it is highly likely that dates are inserted in chronological (sorted) order which would be perfect.

John Cummings
  • 1,949
  • 3
  • 22
  • 38
Jon
  • 428,835
  • 81
  • 738
  • 806
  • 5
    OP could also switch to SortedList<> (to avoid the conversion to List<>) and then use this method --> [LINK](http://stackoverflow.com/questions/594518/is-there-a-lower-bound-function-in-c-sharp-on-a-sortedlist) – digEmAll Sep 13 '12 at 19:05
  • @digEmAll: That's a very good suggestion, even more so because likely the collection is constructed on a very `SortedList`-friendly manner. I incorporated it into the answer. – Jon Sep 13 '12 at 19:12
  • Perfect. You are correct that the lists will be populated in order, but will be queried many times and not in order as different permutations of possible loans and contracts are calculated. – Joshua Honig Sep 13 '12 at 19:41
11

So, this doesn't directly answer your question, because you specifically asked for something built-in to the .NET framework, but facing a similar problem, I found the following solution to work best, and I wanted to post it here for other searchers.

I used the TreeDictionary<K, V> from the C5 Collections (GitHub/NuGet), which is an implementation of a red-black tree.

It has Predecessor/TryPredecessor and WeakPredessor/TryWeakPredecessor methods (as well as similar methods for successors) to easily find the nearest items to a key.

More useful in your case, I think, is the RangeFrom/RangeTo/RangeFromTo methods that allow you to retrieve a range of key-value-pairs between keys.

Note that all of these methods can also be applied to the TreeDictionary<K, V>.Keys collection, which allow you to work with only the keys as well.

It really is a very neat implementation, and something like it deserves to be in the BCL.

rikoe
  • 1,639
  • 1
  • 21
  • 29
  • Use => Predecessor , (diff1), (Value) , (diff2), Successor => if diff1 < diff2 then use Predecessor method. – Rommel Nov 30 '16 at 01:38
1

It is not possible to efficiently find the nearest key with SortedList, SortedDictionary or any other "built-in" .NET type, if you need to interleave queries with inserts (unless your data arrives pre-sorted, or the collection is always small).

As I mentioned on the other question you referenced, I created three data structures related to B+ trees that provide find-nearest-key functionality for any sortable data type: BList<T>, BDictionary<K,V> and BMultiMap<K,V>. Each of these data structures provide FindLowerBound() and FindUpperBound() methods that work like C++'s lower_bound and upper_bound.

These are available in the Loyc.Collections NuGet package, and BDictionary typically uses about 44% less memory than SortedDictionary.

Qwertie
  • 16,354
  • 20
  • 105
  • 148
  • Very nice implementation, Can I extend SparseAList to support qword index size? It would be very good If there was `long` version of this. – M.kazem Akhgary Sep 25 '17 at 14:12
  • Hmm, it seems all the interfaces are designed for 32-bit - but if you'd like to fork it on GitHub and make a 64-bit version, feel free. Unfortunately I don't think one single version is enough, as 64-bit indices would increase the memory usage by up to 50% and not all clients would want that. A bigger problem is that **all** ALists are derived from `AListBase`, which uses `AListInner`/`AListInnerBase`, which contains 32-bit indexes. So it seems like 64-bit version would have to eliminate the base class and replicate all its code. – Qwertie Sep 26 '17 at 22:16
  • Note: you could use [LeMP](http://ecsharp.net/lemp/) to avoid physically replicating the code (thus duplicating any remaining bugs). Instead, you could use `using` to introduce new names for indexes (`using index = int; using uindex = uint`) and then use LeMP to do find-and-replace operations on the 32-bit version of the code to generate a 64-bit version. If you create an [issue](https://github.com/qwertie/ecsharp/issues) we can talk more about how this approach would work. Note: LeMP and SparseAList are defined in the same repo. – Qwertie Sep 26 '17 at 22:24
-1
    public static DateTime RoundDown(DateTime dateTime)
    {
        long remainingTicks = dateTime.Ticks % PeriodLength.Ticks;
        return dateTime - new TimeSpan(remainingTicks);
    }
ChrisG65
  • 59
  • 2
  • 1
    You should provide some explanation about what your code is doing. Don't just post code. – BrokenBinary Oct 17 '15 at 22:31
  • I thought it was self-evident. Used rounded DateTimes as your keys, and use the same method for look-ups. PeriodLength can be a week, a day, an hour, 5 minutes, or whatever regular length period you need. If PeriodLength is not a constant for the app, a more generic solution would accept periodLength as a parameter. – ChrisG65 Apr 23 '18 at 17:12