59

I am able to read and slice pandas dataframe using python datetime objects, however I am forced to use only existing dates in index. For example, this works:

>>> data
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 252 entries, 2010-12-31 00:00:00 to 2010-04-01 00:00:00
Data columns:
Adj Close    252  non-null values
dtypes: float64(1)

>>> st = datetime.datetime(2010, 12, 31, 0, 0)
>>> en = datetime.datetime(2010, 12, 28, 0, 0)

>>> data[st:en]
            Adj Close
Date                 
2010-12-31     593.97
2010-12-30     598.86
2010-12-29     601.00
2010-12-28     598.92

However if I use a start or end date that is not present in the DF, I get python KeyError.

My Question : How do I query the dataframe object for a date range; even when the start and end dates are not present in the DataFrame. Does pandas allow for range based slicing?

I am using pandas version 0.10.1

Rishabh Sagar
  • 1,744
  • 2
  • 17
  • 27
  • Right now, on Pandas 0.20.3, I'm actually not getting any `KeyErrors`, even when using `datetimes` outside the frame's index... So the simplest solution is currently to just do like above in the OP! `df[dt.datetime(1914, 1, 1):]` gives me dates from 2010. – Tomasz Gandor Apr 29 '19 at 12:20

6 Answers6

56

Use searchsorted to find the nearest times first, and then use it to slice.

In [15]: df = pd.DataFrame([1, 2, 3], index=[dt.datetime(2013, 1, 1), dt.datetime(2013, 1, 3), dt.datetime(2013, 1, 5)])

In [16]: df
Out[16]: 
            0
2013-01-01  1
2013-01-03  2
2013-01-05  3

In [22]: start = df.index.searchsorted(dt.datetime(2013, 1, 2))

In [23]: end = df.index.searchsorted(dt.datetime(2013, 1, 4))

In [24]: df.iloc[start:end]
Out[24]: 
            0
2013-01-03  2
Tomasz Gandor
  • 8,235
  • 2
  • 60
  • 55
waitingkuo
  • 89,478
  • 28
  • 112
  • 118
  • If I copy paste your example, it works fine. But the start and end variables in my program, always default to the length of the dataframe! what am I doing wrong? - http://pastebin.com/raw.php?i=hfpHqF7s – Rishabh Sagar Apr 23 '13 at 22:21
  • Seems you should sort your `DataFrame` in ascending order. – waitingkuo Apr 24 '13 at 01:28
  • Thanks, it worked when the data was in sorted in ascending order. – Rishabh Sagar Apr 24 '13 at 06:21
  • 2
    Note that `searchsorted` is no longer defined on DataFrame or Series, see [this question](http://stackoverflow.com/questions/22669208/attributeerror-series-object-has-no-attribute-searchsorted-pandas). – Wilfred Hughes Jan 15 '15 at 17:23
36

Short answer: Sort your data (data.sort()) and then I think everything will work the way you are expecting.

Yes, you can slice using datetimes not present in the DataFrame. For example:

In [12]: df
Out[12]: 
                   0
2013-04-20  1.120024
2013-04-21 -0.721101
2013-04-22  0.379392
2013-04-23  0.924535
2013-04-24  0.531902
2013-04-25 -0.957936

In [13]: df['20130419':'20130422']
Out[13]: 
                   0
2013-04-20  1.120024
2013-04-21 -0.721101
2013-04-22  0.379392

As you can see, you don't even have to build datetime objects; strings work.

Because the datetimes in your index are not sequential, the behavior is weird. If we shuffle the index of my example here...

In [17]: df
Out[17]: 
                   0
2013-04-22  1.120024
2013-04-20 -0.721101
2013-04-24  0.379392
2013-04-23  0.924535
2013-04-21  0.531902
2013-04-25 -0.957936

...and take the same slice, we get a different result. It returns the first element inside the range and stops at the first element outside the range.

In [18]: df['20130419':'20130422']
Out[18]: 
                   0
2013-04-22  1.120024
2013-04-20 -0.721101
2013-04-24  0.379392

This is probably not useful behavior. If you want to select ranges of dates, would it make sense to sort it by date first?

df.sort_index()
Dan Allan
  • 34,073
  • 6
  • 70
  • 63
  • When I try to do this, I get an python exception: TimeSeriesError: Partial indexing only valid for ordered time series. – Rishabh Sagar Apr 23 '13 at 22:27
  • The exception was self explanatory - I had missed sorting the data, :( - Thanks, text based slicing as you've shown above works as expected. But I used the searchsorted function since the dates in program were already datetime objects. – Rishabh Sagar Apr 24 '13 at 06:25
  • 3
    df['20130419':'20130422'] is exceptional! Even works with sparse data (e.g. specifying a date that doesn't exist in the index). Thank you! – fantabolous Jun 26 '14 at 04:10
  • Please note that data.sort() is now deprecated. The replacement for this application would be data.sort_index() - [http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sort_index.html](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sort_index.html) – Kapocsi Aug 04 '16 at 12:16
  • Thanks, Kapocsi. Updated. – Dan Allan Aug 05 '16 at 16:51
22

You can use a simple mask to accomplish this:

date_mask = (data.index > start) & (data.index < end)
dates = data.index[date_mask]
data.ix[dates]

By the way, this works for hierarchical indexing as well. In that case data.index would be replaced with data.index.levels[0] or similar.

watsonic
  • 3,155
  • 1
  • 27
  • 31
1

I had difficulty with other approaches but I found that the following approach worked for me:

# Set the Index to be the Date
df['Date'] = pd.to_datetime(df['Date_1'], format='%d/%m/%Y')
df.set_index('Date', inplace=True)

# Sort the Data
df = df.sort_values('Date_1')

# Slice the Data
From = '2017-05-07'
To   = '2017-06-07'
df_Z = df.loc[From:To,:]
R. Cox
  • 819
  • 8
  • 25
1
target_date = pd.Timestamp('2020-05-10')

df[df['DATE'].dt.date == (dt.date(target_date.year, target_date.month, target_date.day))]

This line will let you select all the dates that matches the condition, in case you don't neet to set "Date Column" as index.

fega_zero
  • 125
  • 9
  • 1
    Looks good. Put just to point out that the dt in df[df['DATE'].dt.date is not the same dt in dt.date, and the latter has to be exposed by import datetime as dt to avoid getting an error – Oeyvind Oct 22 '21 at 12:20
1

Given a dataframe df where the date to query on is in the index, you can use query:

In [5]: df = pd.DataFrame({'Close': {'2010-12-31': 593.97, '2010-12-30': 598.86, '2010-12-29': 601.0, '2010-12-28': 598.92}})

In [6]: df
Out[6]: 
             Close
Date              
2010-12-31  593.97
2010-12-30  598.86
2010-12-29  601.00
2010-12-28  598.92

In [7]: df.query('"2010-12-01" < index <= "2010-12-29"')
Out[7]: 
             Close
Date              
2010-12-29  601.00
2010-12-28  598.92

The start and/or end date don't need to be present in the index of df.

rachwa
  • 1,805
  • 1
  • 14
  • 17