9

I have an OrderedDictionary that contains rate values. Each entry has a date for a key (each date happening to be the start of a yearly quarter), and the value is a number. Dates are inserted in order, from oldest to newest.

{
    date(2017, 1, 1): 95,
    date(2018, 1, 1): 100,
    date(2018, 6, 1): 110,
    date(2018, 9, 1): 112,
}

My dictionary of rates is much larger than this, but this is the general idea. Given an arbitrary date, I want to find the value in the dictionary that precedes it. For example, looking up a date of date(2018, 8, 1) should return the value 110, since the entry date(2018, 6, 1) is the nearest key that precedes my date lookup. Similarly, a date of date(2017, 12, 1) should return 95, as the nearest preceding key happens to be date(2017, 1, 1).

I could easily do this by walking the items in the dictionary:

def find_nearest(lookup):
    nearest = None
    for d, value in rates.items():
        if(d > lookup):
            break
        nearest = value
    return nearest

This feels inefficient to me, however, since in the worst case I have to scan the entire dictionary (which I mentioned before could be large). I will be doing tens of thousands of these kinds of lookups, so I want it to be performant.

Another option to solve the performance issue would be to create a cache of what I've seen, which is also doable, though I wonder about the memory constraints (I'm not entirely sure how large the cache would grow).

Are there any clever methods or Python core modules I can use here?

Jonah Bishop
  • 12,279
  • 6
  • 49
  • 74
  • 2
    I think you should use a binary tree here (or a sorted list), such that you can query in *O(log n)*. Or there are data structures that encode a dictionary with a linked list such that one can obtain the next and previous element. – Willem Van Onsem Sep 27 '18 at 11:24

5 Answers5

2

Since you're inserting dates into the dict in order and you are presumably using Python 3.7 (which makes dict order significant), you can use a recursive function that divides and conquers to find the desired index of the key list in O(log n) time complexity:

def find_nearest(l, lookup):
    if len(l) == 1:
        return l[0]
    mid = len(l) // 2
    if l[mid] > lookup:
        return find_nearest(l[:mid], lookup)
    return find_nearest(l[mid:], lookup)

so that:

from datetime import date
d = {
    date(2017, 1, 1): 95,
    date(2018, 1, 1): 100,
    date(2018, 6, 1): 110,
    date(2018, 9, 1): 112,
}
d[find_nearest(list(d), date(2018, 8, 1))]

returns: 110

blhsing
  • 91,368
  • 6
  • 71
  • 106
2

sortedcontainers may be what you want.

It will keep the key in sorted order rather than insert order, which is different from collections.OrderedDict.

Install

$ pip install sortedcontainers

To achieve what you want

from sortedcontainers import SortedDict
def find_nearest(sorted_dict, lookup):
    key = sorted_dict.iloc[sorted_dict.bisect_left(lookup) - 1]
    return sorted_dict[key]

sd = SortedDict({0: '0', 4: '4', 8: '8', 12: '12'})
print(find_nearest(sd, 4))  # 0
print(find_nearest(sd, 3))  # 0
print(find_nearest(sd, 12))  # 8 

The time complexity of this method is O(log n)

zxch3n
  • 387
  • 3
  • 9
0

Edit I just realized you wanted a core module – my answer uses pandas!

If you have unique date values, you can use pandas to create a dataframe which uses the dates as indices:

df = pd.DataFrame.from_dict(rates, orient='index', columns=['value'])
# Convert index to pandas datetime
df.index = pd.to_datetime(df.index)

This returns:

              value
2017-01-01     95
2018-01-01    100
2018-06-01    110
2018-09-01    112

Then:

def lookup(date, df):
    # Convert input to datetime
    date = pd.to_datetime(date)
    # Get closest date in index
    closest_date = min(df.index, key=lambda x: abs(x - date))
    # Find corresponding index of closest date
    index = np.where(df.index == closest_date)[0][0]
    # If the date found if greater than the input, then get the date at the index before
    if closest_date > date:
        index -= 1

    return df.iloc[index].value

>> lookup('2018-06-02', df)
Out: 110

>> lookup('2018-05-01', df)
Out: 100
Michele Tonutti
  • 4,298
  • 1
  • 21
  • 22
0

Since OrderedDict is implemented via linked lists, you cannot directly retrieve values by position in less than O(n) time, although you can take advantage of keys being sorted to reduce to O(log n). See also: Accessing dictionary items by position in Python 3.6+ efficiently.

For efficiency, I suggest you use a 3rd party library such as Pandas, which uses NumPy arrays held in contiguous memory blocks. The time complexity is O(n), but you should see improved performance for larger input dictionaries.

import pandas as pd
from datetime import date

d = {date(2017, 1, 1): 95, date(2018, 1, 1): 100,
     date(2018, 6, 1): 110, date(2018, 9, 1): 112}

df = pd.DataFrame.from_dict(d, orient='index')
df.index = pd.to_datetime(df.index)

my_date = pd.to_datetime(date(2018, 8, 1))
res = df[0].iat[df.index.get_loc(my_date, method='ffill')]  # 110

An alternative, more verbose method:

diffs = (my_date - df.index) > pd.Timedelta(0)
res = df[0].iat[-(diffs[::-1].argmax() + 1)]                # 110
jpp
  • 159,742
  • 34
  • 281
  • 339
-1

you could try .get() method, which returns a value only if it exists, otherwise returns None

import datetime
from datetime import date

def findNearest(somedate, dictionary):
    while dictionary.get(somedate) is None:
        somedate=somedate-datetime.timedelta(1)

    return dictionary.get(somedate)


result=findNearest(date(2017, 1, 3), yourDictionary)

when you print result it will print '95', the value for date(2017, 1, 1)

Petru Tanas
  • 1,087
  • 1
  • 12
  • 36
  • @brunodesthuilliers It depends on the data, first they assume whole days, and then if the dates are fairly close together might be ok, if the dates are 10000s of years apart then will be bad – Chris_Rands Sep 27 '18 at 11:49