52

How do i query for the closest index from a Pandas DataFrame? The index is DatetimeIndex

2016-11-13 20:00:10.617989120   7.0 132.0
2016-11-13 22:00:00.022737152   1.0 128.0
2016-11-13 22:00:28.417561344   1.0 132.0

I tried this:

df.index.get_loc(df.index[0], method='nearest')

but it give me InvalidIndexError: Reindexing only valid with uniquely valued Index objects

Same error if I tried this:

dt = datetime.datetime.strptime("2016-11-13 22:01:25", "%Y-%m-%d %H:%M:%S")
df.index.get_loc(dt, method='nearest')

But if I remove method='nearest' it works, but that is not I want, I want to find the closest index from my query datetime

ah bon
  • 9,293
  • 12
  • 65
  • 148
Bryan Fok
  • 3,277
  • 2
  • 31
  • 59

4 Answers4

76

It seems you need first get position by get_loc and then select by []:

dt = pd.to_datetime("2016-11-13 22:01:25.450")
print (dt)
2016-11-13 22:01:25.450000

print (df.index.get_loc(dt, method='nearest'))
2

idx = df.index[df.index.get_loc(dt, method='nearest')]
print (idx)
2016-11-13 22:00:28.417561344
#if need select row to Series use iloc
s = df.iloc[df.index.get_loc(dt, method='nearest')]
print (s)
b      1.0
c    132.0
Name: 2016-11-13 22:00:28.417561344, dtype: float64
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • thank you for your solution. I believe your solution would work, but it just doesnt work on me... this is my index type – Bryan Fok Feb 16 '17 at 06:50
  • Can you explain more? Does it return wrong value? Problem is with this sample of with real data? – jezrael Feb 16 '17 at 06:54
  • dt =pd.to_datetime("2016-11-13 22:01:25.450"); df.index.get_loc(dt, method='nearest'); InvalidIndexError: Reindexing only valid with uniquely valued Index objects – Bryan Fok Feb 16 '17 at 06:57
  • could that be when i set index, I set the time unit as 'us' ? – Bryan Fok Feb 16 '17 at 06:59
  • Do you think convert `ns` to `us` ? – jezrael Feb 16 '17 at 07:00
  • i only have the df now, i dont have the code that created this df. But from my memory when it set the index, it is using 'us'... – Bryan Fok Feb 16 '17 at 07:01
  • Are you sure? Because `DatatimeIndex` is by default in `ns` - `print (df.index.values.astype(np.int64))` – jezrael Feb 16 '17 at 07:05
  • 1
    Is possible remove duplicates in `index` ? `idx = df.index.drop_duplicates().get_loc(dt, method='nearest')` – jezrael Feb 16 '17 at 07:08
  • 4
    Is there any way to query for multiple date times simultaneously? `get_loc` only accepts scalar values. – fhchl Jul 26 '19 at 13:25
  • It seems the only way to do this for a non-index column is to create another df with set_index. – misantroop Nov 03 '19 at 10:27
  • Seems as if this only works for series or dfs where the index is unique. Heres an example that reproduces the error: `dts = [datetime(2020,1,1,1), datetime(2020,1,1,2), datetime(2020,1,1,2), datetime(2020,1,1,3)] idx = pd.DatetimeIndex(dts) idx.get_loc(pd.to_datetime("2020-01-01 02:00:00"), method="nearest")` – FlorianK Apr 15 '21 at 22:35
  • 2
    as mentioned by @HarryChil, `DatetimeIndex.get_loc` is now deprecated in favour of `DatetimeIndex.get_indexer`. This method requires a list and therefore is also the go to method if you want to query multiple date times as asked by @fhchl – Brueni92 Sep 15 '22 at 09:42
23

DatetimeIndex.get_loc is now deprecated in favour of DatetimeIndex.get_indexer...

ts = pd.to_datetime('2022-05-26 13:19:48.154000')        # example time
iloc_idx = df.index.get_indexer([ts], method='nearest')  # returns absolute index into df e.g. array([5])
loc_idx = df.index[iloc_idx]                             # if you want named index

my_val = df.iloc[iloc_idx]
my_val = df.loc[loc_idx]                                 # as above so below...
Benjamin Breton
  • 1,388
  • 1
  • 13
  • 42
HarryChil
  • 349
  • 2
  • 10
3

I believe jezrael solution works, but not on my dataframe (which i have no clue why). This is the solution that I came up with.

from bisect import bisect #operate as sorted container
timestamps = np.array(df.index)
upper_index = bisect(timestamps, np_dt64, hi=len(timestamps)-1) #find the upper index of the closest time stamp
df_index = df.index.get_loc(min(timestamps[upper_index], timestamps[upper_index-1],key=lambda x: abs(x - np_dt64))) #find the closest between upper and lower timestamp
Bryan Fok
  • 3,277
  • 2
  • 31
  • 59
1

I know it's an old question, but while searching for the same problems as Bryan Fok, I landed here. So for future searchers getting here, I post my solution. My index had 4 non-unique items (possibly due to rounding errors when recording the data). The following worked and showed the correct data:

dt = pd.to_datetime("2016-11-13 22:01:25.450")

s = df.loc[df.index.unique()[df.index.unique().get_loc(dt, method='nearest')]]

However, in case your nearest index occures multiple times, this will return multiple rows. If you want to catch that, you could test for it with:

if len(s) != len(df.columns):
    # do what is appropriate for your case
    # e.g. selecting only the first occurence
    s.iloc[0]

Edit: fixed the catching after some test

Jakob
  • 226
  • 2
  • 7