Filtering Pandas DataFrames on dates

Question

I have a Pandas DataFrame with a 'date' column. Now I need to filter out all rows in the DataFrame that have dates outside of the next two months. Essentially, I only need to retain the rows that are within the next two months.

What is the best way to achieve this?

score 457 · Answer 1 · edited Sep 22 '18 at 07:49

457

If date column is the index, then use .loc for label based indexing or .iloc for positional indexing.

For example:

df.loc['2014-01-01':'2014-02-01']

See details here http://pandas.pydata.org/pandas-docs/stable/dsintro.html#indexing-selection

If the column is not the index you have two choices:

Make it the index (either temporarily or permanently if it's time-series data)
df[(df['date'] > '2013-01-01') & (df['date'] < '2013-02-01')]

See here for the general explanation

Note: .ix is deprecated.

edited Sep 22 '18 at 07:49

Foreever

7,099
8
53
55

answered Apr 06 '14 at 19:32

Retozi

7,521
1
23
16

4

Thank you, will read. The date is a seperate column and not the index in my case. I should have probably given that information in the first place. MY question was not very informative. – AMM Apr 06 '14 at 19:35
updated my answer to account for date column filtering – Retozi Apr 06 '14 at 19:45
79

You can use `query` here as well. `df.query('20130101 < date < 20130201')`. – Phillip Cloud Apr 06 '14 at 19:56
2

Same as: http://stackoverflow.com/questions/16341367/grabbing-selection-between-specific-dates-in-a-dataframe Which is also useful. – Union find Dec 31 '14 at 20:03
16

You should mention that the filters for index (via `.loc` and `.ix`) and columns in your examples are not equivalent. `df.ix['2014-01-01':'2014-02-01']` includes `2014-02-01` while `df[(df['date'] > '2013-01-01') & (df['date'] < '2013-02-01')]` does not include `2013-02-01`, it will only match rows up to `2013-01-31`. – Rafael Barbosa Jul 26 '16 at 12:57
7

What if one doesn't want to filter on a date range, but on multiple datetimes ? – Salem Jul 06 '18 at 08:42
Will this correctly compare a datetime object against a string? – Addem Aug 19 '18 at 05:13
@Phillip Cloud I would like use your solution, but: `df.query('20130101 < time < 20130201')` and `df.query('2013-01-01 < time < 2013-02-01')` throw `TypeError: Cannot compare type 'Period' with type 'int'` (or string). Tried `df.query(pd.to_datetime('2013-01-01') < 'time' < pd.to_datetime('2013-02-01'))`but `TypeError: Cannot compare type 'Timestamp' with type 'str'`. My df['time'] is a `to_datetime`object, formatted as `'%Y%m%d'`. Any suggestion ? – user305883 Mar 16 '19 at 11:50
Figure it out: my error was because I stripped date object to only keep date formatting, without time. So it was no more a to_datetime object. That is, my df[time] was constructed as `pd.to_datetime(..).dt.date` instead of simple `pd.to_datetime(..)` – user305883 Mar 16 '19 at 12:02
1

I get this: TypeError: '<' not supported between instances of 'datetime.date' and 'str' – MarMat Jul 01 '20 at 12:25
How can I set the filter more precise; such as from '2013-01-01 16:53:22:' onwards? – Baobab Mar 09 '22 at 14:11
Please be aware that by using loc it means you are expecting the exact min and max dates to exists as values. This solution won't work if you're filtering by dates that might not exist exactly in the df – Crispy Holiday May 11 '22 at 02:55
Make sure the index is sorted (`sort_index(inplace=True)`) otherwise you may get `KeyError: 'Value based partial slicing on non-monotonic DatetimeIndexes with non-existing keys is not allowed.'` – AXO Aug 26 '23 at 02:47

score 90 · Answer 2 · edited Aug 08 '18 at 17:38

90

Previous answer is not correct in my experience, you can't pass it a simple string, needs to be a datetime object. So:

import datetime 
df.loc[datetime.date(year=2014,month=1,day=1):datetime.date(year=2014,month=2,day=1)]

edited Aug 08 '18 at 17:38

wordsforthewise

13,746
5
87
117

answered Feb 04 '16 at 14:29

orange1

2,871
3
32
58

24

I can absolutely pass a string with no issues. – Ninjakannon Aug 20 '16 at 10:27
10

ix indexer is deprecated, use loc - http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated – Nick May 15 '17 at 14:00
6

pandas will convert any "datetime" string into a datetime object.. so it's correct – janscas Mar 16 '18 at 10:04
15

I recieve the following error using this: TypeError: '<' not supported between instances of 'int' and 'datetime.date' – Michael Norman Aug 28 '18 at 17:43

score 70 · Answer 3 · answered Jan 25 '17 at 07:02

70

And if your dates are standardized by importing datetime package, you can simply use:

df[(df['date']>datetime.date(2016,1,1)) & (df['date']<datetime.date(2016,3,1))]

For standarding your date string using datetime package, you can use this function:

import datetime
datetime.datetime.strptime

answered Jan 25 '17 at 07:02

shm2008

1,393
1
11
10

17

It is recommended to use `df[(df['date']>pd.Timestamp(2016,1,1)) & (df['date'] – So S Sep 27 '19 at 15:31

score 52 · Answer 4 · edited Feb 11 '22 at 01:46

52

If you have already converted the string to a date format using pd.to_datetime you can just use:

df = df[(df['Date'] > "2018-01-01") & (df['Date'] < "2019-07-01")]

edited Feb 11 '22 at 01:46

ah bon

9,293
12
65
148

answered Jul 21 '20 at 19:06

Jerin Mathew

670
5
6

Ekrem Gurdal · Answer 5 · 2021-04-25T22:12:45.957

37

The shortest way to filter your dataframe by date: Lets suppose your date column is type of datetime64[ns]

# filter by single day
df_filtered = df[df['date'].dt.strftime('%Y-%m-%d') == '2014-01-01']

# filter by single month
df_filtered = df[df['date'].dt.strftime('%Y-%m') == '2014-01']

# filter by single year
df_filtered = df[df['date'].dt.strftime('%Y') == '2014']

edited Apr 25 '21 at 22:12

answered Apr 21 '20 at 08:23

Ekrem Gurdal

1,118
13
14

"short" but slow for large df's because of the additional step of converting each individual date to a string – fantabolous Apr 10 '23 at 08:22

score 32 · Answer 6 · answered Jun 18 '18 at 00:38

If your datetime column have the Pandas datetime type (e.g. datetime64[ns]), for proper filtering you need the pd.Timestamp object, for example:

from datetime import date

import pandas as pd

value_to_check = pd.Timestamp(date.today().year, 1, 1)
filter_mask = df['date_column'] < value_to_check
filtered_df = df[filter_mask]

fantabolous · Answer 7 · 2023-04-10T08:24:49.433

24

If the dates are in the index then simply:

df['20160101':'20160301']

Edit: while short, this style is now deprecated (at least as of pandas 1.5.3) and the recommended style is df.loc['20160101':'20160301'] (as in other answers)

edited Apr 10 '23 at 08:24

answered Nov 18 '16 at 14:18

fantabolous

21,470
7
54
51

danielhrisca · Answer 8 · 2020-05-26T16:51:17.630

20

You can use pd.Timestamp to perform a query and a local reference

import pandas as pd
import numpy as np

df = pd.DataFrame()
ts = pd.Timestamp

df['date'] = np.array(np.arange(10) + datetime.now().timestamp(), dtype='M8[s]')

print(df)
print(df.query('date > @ts("20190515T071320")')

with the output

                 date
0 2019-05-15 07:13:16
1 2019-05-15 07:13:17
2 2019-05-15 07:13:18
3 2019-05-15 07:13:19
4 2019-05-15 07:13:20
5 2019-05-15 07:13:21
6 2019-05-15 07:13:22
7 2019-05-15 07:13:23
8 2019-05-15 07:13:24
9 2019-05-15 07:13:25


                 date
5 2019-05-15 07:13:21
6 2019-05-15 07:13:22
7 2019-05-15 07:13:23
8 2019-05-15 07:13:24
9 2019-05-15 07:13:25

Have a look at the pandas documentation for DataFrame.query, specifically the mention about the local variabile referenced udsing @ prefix. In this case we reference pd.Timestamp using the local alias ts to be able to supply a timestamp string

edited May 26 '20 at 16:51

answered May 15 '19 at 07:16

danielhrisca

665
1
5
11

Could you pass a link for documentation for @ts functions? – Glen Moutrie May 26 '20 at 11:45
1

You may not need `pd.TimeStamp` here. `df.query('date > 20190515071320')` seems to work fine. – ChaimG Oct 26 '21 at 13:54
yes, @ChaimG it works fine! Not needed using pd.TimeStamp. Pandas query() is very flexible: Accepts integers `df.query('date > 20190515071320')` or string `df.query('date > "20190515071320"')`. ....Do you want easy to read: `df.query('date > "2019-05-15 07:13:20"')` – A.Sommerh Nov 06 '22 at 21:07
With "T"? ok: ............................................................ `df.query('date > "20190515T071320"')` `df.query('date > "2022-11-06 T 20:32:00"')` – A.Sommerh Nov 06 '22 at 21:13
Only date without time part? Fine:..................... `df.query('date > "20190515"')` `df.query('date > "2019-05-15"')` `df.query('date > 20190515')` # Even as date integer don't get confused with integer with date+time. Smart enough! – A.Sommerh Nov 06 '22 at 21:16

uhetz · Answer 9 · 2020-11-27T10:08:51.720

13

I'm not allowed to write any comments yet, so I'll write an answer, if somebody will read all of them and reach this one.

If the index of the dataset is a datetime and you want to filter that just by (for example) months, you can do following:

df.loc[df.index.month == 3]

That will filter the dataset for you by March.

edited Nov 27 '20 at 10:08

answered Apr 23 '20 at 08:22

uhetz

382
3
11

2

I think there is a small typo, it should be `df.loc[df.index.month == 3]` – Alberto Nov 10 '20 at 18:14
Best Answer yet. It might throw an error if your column is a Series for that you can try `df.loc[df.ColumnName.dt.month == 3]` – Faraz Zaidi Jul 28 '22 at 12:55

Harry · Answer 10 · 2020-05-23T11:14:31.523

So when loading the csv data file, we'll need to set the date column as index now as below, in order to filter data based on a range of dates. This was not needed for the now deprecated method: pd.DataFrame.from_csv().

If you just want to show the data for two months from Jan to Feb, e.g. 2020-01-01 to 2020-02-29, you can do so:

import pandas as pd
mydata = pd.read_csv('mydata.csv',index_col='date') # or its index number, e.g. index_col=[0]
mydata['2020-01-01':'2020-02-29'] # will pull all the columns
#if just need one column, e.g. Cost, can be done:
mydata['2020-01-01':'2020-02-29','Cost']

This has been tested working for Python 3.7. Hope you will find this useful.

`index_col` has to be a `string` not a list. `mydata = pd.read_csv('mydata.csv',index_col='date')` — ANUBIS, Apr 16 '20 at 14:13

score 5 · Answer 11 · edited Mar 28 '21 at 09:43

5

You could just select the time range by doing: df.loc['start_date':'end_date']

edited Mar 28 '21 at 09:43

petezurich

9,280
9
43
57

answered Jun 12 '20 at 09:51

Ernesto Lopez Fune

543
5
22

score 5 · Answer 12 · edited Apr 10 '23 at 08:29

5

import pandas as pd

STEP 1: convert the date column into a pandas datetime using pd.to_datetime()

df['date']=pd.to_datetime(df["date"],unit='s')

STEP 2: perform the filtering in any predetermined manner ( i.e 2 months)

df = df[(df["date"] >"2022-03-01" & df["date"] < "2022-05-03")]

edited Apr 10 '23 at 08:29

fantabolous

21,470
7
54
51

answered May 03 '22 at 09:46

Martin Maati

121
1
5

score 4 · Answer 13 · edited Nov 24 '19 at 06:25

4

How about using pyjanitor

It has cool features.

After pip install pyjanitor

import janitor

df_filtered = df.filter_date(your_date_column_name, start_date, end_date)

edited Nov 24 '19 at 06:25

aminography

21,986
13
70
74

answered Nov 24 '19 at 06:09

pakira79

111
2
5

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). – mah65 Apr 13 '21 at 09:27

score 4 · Answer 14 · answered Oct 12 '21 at 02:38

4

In pandas version 1.1.3 I encountered a situation where the python datetime based index was in descending order. In this case

df.loc['2021-08-01':'2021-08-31']

returned empty. Whereas

df.loc['2021-08-31':'2021-08-01']

returned the expected data.

answered Oct 12 '21 at 02:38

Paul

7,155
8
41
40

score 3 · Answer 15 · answered Aug 03 '21 at 13:04

Another solution if you would like to use the .query() method.

It allows you to use write readable code like .query(f"{start} < MyDate < {end}") on the trade off, that .query() parses strings and the columns values must be in pandas date format (so that it is also understandable for .query())

df = pd.DataFrame({
     'MyValue': [1,2,3],
     'MyDate': pd.to_datetime(['2021-01-01','2021-01-02','2021-01-03'])
})
start = datetime.date(2021,1,1).strftime('%Y%m%d')
end = datetime.date(2021,1,3).strftime('%Y%m%d')
df.query(f"{start} < MyDate < {end}")

(following the comment from @Phillip Cloud, answer from @Retozi)

score 1 · Answer 16 · answered Mar 16 '22 at 11:15

1

# 60 days from today
after_60d = pd.to_datetime('today').date() + datetime.timedelta(days=60)
# filter date col less than 60 days date
df[df['date_col'] < after_60d]

answered Mar 16 '22 at 11:15

DSBLR

555
5
9

Filtering Pandas DataFrames on dates

16 Answers16

Linked

Related