81

I have an array of datetime objects, and I would like to find which element in the array is the closest to a given date (e.g datetime.datetime(2014,12,16))

This post shows how to find the nearest date which is not before the given date. How can I alter this code so that it can return dates that are before a given date?

For example, if the array housed elements datetime.datetime(2014,12,10) and datetime.datetime(2014,12,28), the former item should be returned because it is closest to datetime.datetime(2014,12,16) in absolute value.

Community
  • 1
  • 1
user3600497
  • 1,621
  • 1
  • 18
  • 22
  • Maybe you could try sorting the dates then checking the index before and after the date you're looking at and seeing which one is closer to it? – Zarwan Aug 26 '15 at 22:44
  • At the moment, I use `np.argwhere` to find all the elements before and after the given date and take the last of first elements respectively. The problem is, sometimes there are no dates after December 16 2014 in the array, which is a problem in my script. It would be much easier to find the closest date. – user3600497 Aug 26 '15 at 22:45
  • You may consider using [truncate](http://stackoverflow.com/questions/9877391/how-to-get-the-closest-single-row-after-a-specific-datetime-index-using-python-p) function. – Kevin Zhu Oct 12 '16 at 09:04

9 Answers9

151

This function will return the datetime in items which is the closest to the date pivot.

def nearest(items, pivot):
    return min(items, key=lambda x: abs(x - pivot))

The good part this function works on types other than datetime too out of the box, if the type supports comparison, subtraction and abs, e.g.: numbers and vector types.

Tamas Hegedus
  • 28,755
  • 12
  • 63
  • 97
47

As answered on this link link, 'truncate' function is there for you.

df.truncate(before='2012-01-07')

Or you can use get_loc with 'nearest', 'backfill' or 'ffill' option.

df.iloc[df.index.get_loc(datetime.datetime(2016,2,2),method='nearest')]
Kevin Zhu
  • 2,746
  • 26
  • 23
  • 1
    I could not edit (too short for an edit, weird!), but there is a small bug: heading zeroes are not allowed as arguments to a datetime.datetime constructor: it results in an 'invalid token' error. – user3017048 Apr 03 '18 at 11:55
  • I tested: the `iloc` method is around 25% faster on my machine (~446µs vs. ~588µs). – user3017048 Apr 03 '18 at 11:58
  • 3
    I think using `backfill` instead of `nearest` is more straight forward. Because it literally tells you that it would pick the NEXT index if no exact match. – Brian Jul 30 '21 at 10:14
  • This probably should be the accepted answer – Charly Empereur-mot Jan 17 '23 at 12:55
9

This code returns the nearest date before the given date:

def nearest(items, pivot):
    return min([i for i in items if i <= pivot], key=lambda x: abs(x - pivot))
Chiel
  • 662
  • 1
  • 7
  • 30
6

My solution to find the closest index instead of the value

def nearest_ind(items, pivot):
    time_diff = np.abs([date - pivot for date in items])
    return time_diff.argmin(0)
3

To find a closest date and return the timedelta (difference between two dates) I did the following:

def nearest_date(items,pivot):
    nearest=min(items, key=lambda x: abs(x - pivot))
    timedelta = abs(nearest - pivot)
    return nearest, timedelta

This may be useful when you have a minimum threshold for nearness for your app like I did.

MarMat
  • 790
  • 8
  • 12
2

Assuming you want to answer the slight variant: "Given a dataframe with a datetime index, how do I determine the last value of column col where "last" is defined as the last index that is less than some value date


def last(df, date, col):
    return df.loc[                      # access the dataframe using this index
        max(                            # latest date
            df[df.index < date].index   # that precedes `date`
        )
    ][col]                              # access column `col`
Tim P
  • 415
  • 3
  • 11
2

I know this is an old answer, but I just used the code code that Tamas posted and found that it was taking quite a long time - I optimised it and saw much quicker performance; the problem was the iteration was taking a long time, this is my new method - it will only be quicker when the actual pivot appears in the list

def nearest(items, pivot):
    if pivot in items:
    return pivot
else:
    return min(items, key=lambda x: abs(x - pivot))

Hope this helps anyone who came accross this question.

Wes
  • 35
  • 5
  • hi this is useful. how would I write the function if I want to check an entire column in a dataframe? – SModi Jul 22 '21 at 21:40
1
def nearestDate(base, dates):
    nearness = { abs(base.timestamp() - date.timestamp()) : date for date in dates }
    return nearness[min(nearness.keys())]
3442
  • 8,248
  • 2
  • 19
  • 41
1

Using numpy is about 2X faster than loop/lambda approaches. all_dates below is a numpy array of dates.

abs_deltas_from_target_date = np.absolute(all_dates - target_date_raw)
index_of_min_delta_from_target_date = np.argmin(abs_deltas_from_target_date)
closest_date = all_dates[index_of_min_delta_from_target_date]
Ron Cohen
  • 2,815
  • 5
  • 30
  • 45