5

I have the Date portion of DateTime as lookup value and like to retrieve the matching value in a Dictionary of type Dictionary<DateTime, double>. Note the DateTime keys are stored as only the Date portion.

My problem is that there may be no key that matches with my lookup value. What I then like to do is to find the nearest previous dateTime.Date key and matching value.

Now, I am aware that Dictionaries are not sorted by key. I could use a SortedDictionary but prefer to use a Dictionary for a specific reason or else switch to a List collection (can be pre-sorted). My question is, what would you recommend to do in this case: Would it be more efficient to retain the Dictionary structure and decrement my lookup value until I find a matching key? Or would it be better to use a list collection and use Linq? Each Dictionary contains around 5000 key/value pairs. Also, please note I look for a highly computationally efficient solution because the frequency of lookups is quite high (potentially many hundred thousand times (each lookup is guaranteed to be different from any previous value)

Matt
  • 7,004
  • 11
  • 71
  • 117
  • What _specific reason_ do you have to avoid a `SortedDictionary`? – Tim Schmelter May 13 '14 at 13:45
  • 1
    5000 is not a huge number. How often you need to find it? – Sriram Sakthivel May 13 '14 at 13:45
  • @TimSchmelter, I store the collection as serialized json string and do not have good experience with serializing sorted dictionaries. But I can certainly convert to/from a different structure, if it makes the most sense to use a SortedDictionary here. – Matt May 13 '14 at 13:46
  • @SriramSakthivel, good question. Will add that to my question. This is the issue, I need this to be extremely computationally efficient because this lookup occurs at very high frequencies. – Matt May 13 '14 at 13:47
  • It depends on your application whether or not it is better to use an ordered collection. If looking for a DateTime is the primary thing you do, it makes sense. Otherwise, you can loop through the items and compare them to the value you are looking for. – Dennis_E May 13 '14 at 13:48
  • @Dennis_E, this is what I try to avoid, looping through 5000 items many hundred thousand times seems quite inefficient. Hence, my initial Dictionary choice. – Matt May 13 '14 at 13:49
  • A Dictionary is not suited for the search you want to do. Like I said, if you want to do this faster, you need a different kind of collection. A SortedDictionary or a List can retrieve items in log(n) time – Dennis_E May 13 '14 at 13:56
  • @Dennis_E, with all due respect how is log(n) faster than Dictionary key lookup. – Matt May 13 '14 at 14:02
  • You are not looking for a key in the Dictionary; you are looking through ALL the keys to find which one is closest to some value. This requires to run through all of them because they are not sorted in any way. – Dennis_E May 13 '14 at 14:04
  • @Dennis_E, I disagree. As pointed out keys are stored as DateTime.Date portion and so is the lookup value. The whole universe stretches over some 10+ years, hence the about 5000 keyvaluepairs. I do not see why I would have to search the whole dictionary. – Matt May 13 '14 at 14:50
  • First, you look for a Date in the Dictionary. This takes near constant time, yes. But if the Dictionary does not contain that Date, you want to know which Date in the entire Dictionary is closest to some value. If the Dictionary is not sorted, you need to go through them all. Or maybe I misunderstood something in the question. – Dennis_E May 13 '14 at 15:06
  • @Dennis_E, again not true: If I do not initially find the date I could simply decrement the date by a day and check for existence. It would certainly in most cases not iterate the whole dictionary. – Matt May 13 '14 at 15:28
  • Ah, I get it. (I'm an idiot) Yes, you could do that. You would have to test which method is faster then. – Dennis_E May 13 '14 at 15:38

3 Answers3

5

Since you need it fast, I think the best thing would be to use the results of a BinarySearch. This requires a List<T> that's sorted.

int result = myList.BinarySearch(targetDate);
if (result >= 0)
    return myList[result];
else
{
    int nextLarger = ~result;
    // return next smaller, or if that doesn't exist, the smallest one
    return myList[Math.Max(0, nextLarger - 1)];
}

It should be possible to create a class that combines a Dictionary<TKey,TValue> and a sorted List<TKey> that still serializes like a Dictionary<TKey,TValue>. The serialization might be as simple (in Json.NET) as putting a [JsonConverter(typeof(KeyValuePairConverter))] on your class.

Just for completeness and in case others read this in the future, if speed wasn't very important, you can do it more simply with something like this:

var result = myDict.Keys.Where(x => x < targetDate).Max();
Tim S.
  • 55,448
  • 7
  • 96
  • 122
  • just a quick question re your linq query: Should the OrderByDescending not precede the Where clause in this specific example? – Matt May 13 '14 at 15:49
  • You'll get the same result either way. I'd expect that the way I wrote it would be faster, since you wouldn't be sorting as many items. Then again, I [could be wrong](http://stackoverflow.com/questions/11227809/why-is-processing-a-sorted-array-faster-than-an-unsorted-array). – Tim S. May 13 '14 at 15:51
  • I verified, you are actually correct about both, I guess the sorting is faster cause the collection is smaller than the original one? – Matt May 13 '14 at 15:53
  • Right. And actually, I just realized that could be improved with a `Max()`. I'd recommend you profile both ways (binary search and Where/Max) and see which works better. – Tim S. May 13 '14 at 15:55
  • The improved query actually performs great, about ten times faster than the one that contains another sort. Though the binary search approach is tops. – Matt May 13 '14 at 16:06
  • Picked your binary search answer, quick, simple and solves the issue at hand. Thanks a lot for your help. – Matt May 13 '14 at 16:13
2

I would use a custom structure and collection to store these informations:

public struct DateValue
{
    public DateValue(DateTime date, double val)
        : this()
    {
        this.Date = date;
        this.Value = val;
    }
    public DateTime Date { get; set; }
}

Here is a possible implementation of a collection that holds all DateValues and encapsulates the logic to return the nearest. It uses List.BinarySearch to find it. If it doesn't find a direct match it uses the logic of BinarySearch to detect the nearest which is:

The index of the specified value in the specified array, if value is found. If value is not found and value is less than one or more elements in array, a negative number which is the bitwise complement of the index of the first element that is larger than value. If value is not found and value is greater than any of the elements in array, a negative number which is the bitwise complement of (the index of the last element plus 1).

public class DateValueCollection : List<DateValue>, IComparer<DateValue>
{
    public DateValueCollection() { }

    public DateValueCollection(IEnumerable<DateValue> dateValues, bool isOrdered)
    {
        if (isOrdered)
            base.AddRange(dateValues);
        else
            base.AddRange(dateValues.OrderBy(dv => dv.Date));
    }

    public DateValue GetNearest(DateTime date)
    {
        if (base.Count == 0)
            return default(DateValue);

        DateValue dv = new DateValue(date, 0);
        int index = base.BinarySearch(dv, this);

        if (index >= 0)
        {
            return base[index];
        }
        // If not found, List.BinarySearch returns the complement of the index
        index = ~index;

        DateValue[] all;
        if(index >= base.Count - 1)
        {
            // proposed index is last, check previous and last
            all = new[] { base[base.Count - 1], base[base.Count - 2] };
        }
        else if(index == 0)
        {
            // proposed index is first, check first and second
            all = new[] { base[index], base[index + 1] };
        }
        else
        {
            // return nearest DateValue from previous and this
            var thisDV = base[index];
            var prevDV = base[index - 1];
            all = new[]{ thisDV, prevDV };
        }
        return all.OrderBy(x => (x.Date - date).Duration()).First();
    }

    public int Compare(DateValue x, DateValue y)
    {
        return x.Date.CompareTo(y.Date);
    }
}

Quick test:

var dateVals = new[] { 
    new DateValue(DateTime.Today.AddDays(10), 1), new DateValue(DateTime.Today, 3), new DateValue(DateTime.Today.AddDays(4), 7) 
};
var dvCollection = new DateValueCollection(dateVals, false);
DateValue nearest = dvCollection.GetNearest(DateTime.Today.AddDays(1));
Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939
  • This looks quite eloquent, let me digest and I report back. – Matt May 13 '14 at 14:51
  • @MattWolf: there's certainly room for improvement(f.e. it's possible to "inject" an unordered collection), but it should give you an idea. Note that the `static` comparer was redundant, i've removed it. Since the class implements `IComparer` you can use `base.BinarySearch(dv, this)`. – Tim Schmelter May 13 '14 at 15:02
  • I really like the proposed solution, and its very fast, though I preferred the barebones approach by Tim S. in the end. Would upvote several times if I could. – Matt May 13 '14 at 16:40
0

why worry about optomisation prematurley

do it

THEN AND ONLY THEN if its slow then you have a problem Measure it with a profiler

then understanding starts at which point you try other ways and profile them.

Answer is: If you do it any way at all and there isnt a performance problem you just saved time and managed to do something else useful to add value in your day.

Premature optimization is not only pointless you will usually be completely wrong about where you should be looking.

John Nicholas
  • 4,778
  • 4
  • 31
  • 50