Parse a Pandas column to Datetime when importing table from SQL database and filtering rows by date

Question

I have a DataFrame with column named date. How can we convert/parse the 'date' column to a DateTime object?

I loaded the date column from a Postgresql database using sql.read_frame(). An example of the date column is 2013-04-04.

What I am trying to do is to select all rows in a dataframe that has their date columns within a certain period, like after 2013-04-01 and before 2013-04-04.

My attempt below gives the error 'Series' object has no attribute 'read'

Attempt

import dateutil

df['date'] = dateutil.parser.parse(df['date'])

Error

AttributeError                            Traceback (most recent call last)
<ipython-input-636-9b19aa5f989c> in <module>()
     15 
     16 # Parse 'Date' Column to Datetime
---> 17 df['date'] = dateutil.parser.parse(df['date'])
     18 
     19 # SELECT RECENT SALES

C:\Python27\lib\site-packages\dateutil\parser.pyc in parse(timestr, parserinfo, **kwargs)
    695         return parser(parserinfo).parse(timestr, **kwargs)
    696     else:
--> 697         return DEFAULTPARSER.parse(timestr, **kwargs)
    698 
    699 

C:\Python27\lib\site-packages\dateutil\parser.pyc in parse(self, timestr, default, ignoretz, tzinfos, **kwargs)
    299             default = datetime.datetime.now().replace(hour=0, minute=0,
    300                                                       second=0, microsecond=0)
--> 301         res = self._parse(timestr, **kwargs)
    302         if res is None:
    303             raise ValueError, "unknown string format"

C:\Python27\lib\site-packages\dateutil\parser.pyc in _parse(self, timestr, dayfirst, yearfirst, fuzzy)
    347             yearfirst = info.yearfirst
    348         res = self._result()
--> 349         l = _timelex.split(timestr)
    350         try:
    351 

C:\Python27\lib\site-packages\dateutil\parser.pyc in split(cls, s)
    141 
    142     def split(cls, s):
--> 143         return list(cls(s))
    144     split = classmethod(split)
    145 

C:\Python27\lib\site-packages\dateutil\parser.pyc in next(self)
    135 
    136     def next(self):
--> 137         token = self.get_token()
    138         if token is None:
    139             raise StopIteration

C:\Python27\lib\site-packages\dateutil\parser.pyc in get_token(self)
     66                 nextchar = self.charstack.pop(0)
     67             else:
---> 68                 nextchar = self.instream.read(1)
     69                 while nextchar == '\x00':
     70                     nextchar = self.instream.read(1)

AttributeError: 'Series' object has no attribute 'read'

df['date'].apply(dateutil.parser.parse) gives me the error AttributeError: 'datetime.date' object has no attribute 'read'

df['date'].truncate(after='2013/04/01') gives the error TypeError: can't compare datetime.datetime to long

df['date'].dtype returns dtype('O'). Is it already a datetime object?

Please post an example of something in your `date` column! Because pandas should actually recognize a datetime object, so it would be beneficial to see the actual format for that column — Ryan Saxe, May 07 '13 at 06:39
@RyanSaxe I loaded the date column from a Postgresql database using `sql.read_frame()`. An example of the `date` column is `2013-04-04`. How do you check for the dtype of a column? — Nyxynyx, May 07 '13 at 12:52

score 62 · Answer 1 · edited Jun 27 '14 at 14:27

62

Pandas is aware of the object datetime but when you use some of the import functions it is taken as a string. So what you need to do is make sure the column is set as the datetime type not as a string. Then you can make your query.

df['date']  = pd.to_datetime(df['date'])
df_masked = df[(df['date'] > datetime.date(2012,4,1)) & (df['date'] < datetime.date(2012,4,4))]

edited Jun 27 '14 at 14:27

BenMorel

34,448
50
182
322

answered Jun 27 '14 at 07:53

Keith

4,646
7
43
72

score 9 · Answer 2 · answered May 07 '13 at 07:56

9

You probably need apply, so something like:

df['date'] = df['date'].apply(dateutil.parser.parse)

Without an example of the column I can't guarantee this will work, but something in that direction should help you to carry on.

answered May 07 '13 at 07:56

herrfz

4,814
4
26
37

Thanks, I tried `df['date'].apply(dateutil.parser.parse)` and it gave ethe error. `AttributeError: 'datetime.date' object has no attribute 'read'`. An example of the column is `2013-04-04`. The entire dataframe was loaded from a PostgreSQL database using `sql.readframe()`. – Nyxynyx May 07 '13 at 12:48

Ryan Saxe · Accepted Answer · 2013-05-07T13:49:51.070

6

pandas already reads that as a datetime object! So what you want is to select rows between two dates and you can do that by masking:

df_masked = df[(df.date > '2012-04-01') & (df.date < '2012-04-04')]

Because you said that you were getting an error from the string for some reason, try this:

df_masked = df[(df.date > datetime.date(2012,4,1)) & (df.date < datetime.date(2012,4,4))]

edited May 07 '13 at 13:49

answered May 07 '13 at 13:23

Ryan Saxe

17,123
23
80
128

1

`df = df[df.date > '2012-01-01']` gives me an error `TypeError: can't compare datetime.date to str`. – Nyxynyx May 07 '13 at 13:29
1

I use this all the time! That's very odd...your question is very similar to one I asked and I was given this answer and it worked. [See it here](http://stackoverflow.com/questions/16341367/grabbing-selection-between-specific-dates-in-a-dataframe) – Ryan Saxe May 07 '13 at 13:38
Yes.. it works when I created the dataframe manually... but if I create the dataframe from a SQL database using `sql.read_frame`, `'2012-01-01'` gets treated as a string? – Nyxynyx May 07 '13 at 13:41
2

Trying `df[df.date > dateutil.parser.parse('2013-01-01') ]` gives me `TypeError: can't compare datetime.datetime to datetime.date` – Nyxynyx May 07 '13 at 13:42
df.date is type object, but i think `2013-01-01` is treated as a string. The error changes from having a `str` to having a `datetime.date` when I used `dateutil.parser.parse()` *as in above comments* – Nyxynyx May 07 '13 at 13:43
I added something, try that! – Ryan Saxe May 07 '13 at 13:50

score 5 · Answer 4 · answered Oct 19 '18 at 22:43

Don't confuse `datetime.date` with Pandas `pd.Timestamp`

A "Pandas datetime series" contains pd.Timestamp elements, not datetime.date elements. The recommended solution for Pandas:

s = pd.to_datetime(s)    # convert series to Pandas
mask = s > '2018-03-10'  # calculate Boolean mask against Pandas-compatible object

The top answers have issues:

@RyanSaxe's accepted answer's first attempt doesn't work; the second answer is inefficient.
As of Pandas v0.23.0, @Keith's highly upvoted answer doesn't work; it gives TypeError.

Any good Pandas solution must ensure:

The series is a Pandas datetime series, not object dtype.
The datetime series is compared to a compatible object, e.g. pd.Timestamp, or string in the correct format.

Here's a demo with benchmarking, demonstrating that the one-off cost of conversion can be immediately offset by a single operation:

from datetime import date

L = [date(2018, 1, 10), date(2018, 5, 20), date(2018, 10, 30), date(2018, 11, 11)]
s = pd.Series(L*10**5)

a = s > date(2018, 3, 10)             # accepted solution #2, inefficient
b = pd.to_datetime(s) > '2018-03-10'  # more efficient, including datetime conversion

assert a.equals(b)                    # check solutions give same result

%timeit s > date(2018, 3, 10)                  # 40.5 ms
%timeit pd.to_datetime(s) > '2018-03-10'       # 33.7 ms

s = pd.to_datetime(s)

%timeit s > '2018-03-10'                       # 2.85 ms

score 2 · Answer 5 · answered Jun 20 '13 at 11:47

2

You should iterate over the items and parse them independently, then construct a new list.

df['date'] = [dateutil.parser.parse(x) for x in df['date']]

answered Jun 20 '13 at 11:47

ryzhiy

323
4
12

Parse a Pandas column to Datetime when importing table from SQL database and filtering rows by date

5 Answers5

Don't confuse `datetime.date` with Pandas `pd.Timestamp`

Linked

Related

Parse a Pandas column to Datetime when importing table from SQL database and filtering rows by date

5 Answers5

Don't confuse datetime.date with Pandas pd.Timestamp

Linked

Related

Don't confuse `datetime.date` with Pandas `pd.Timestamp`