1

I am using Pandas to read and process csv file. My csv file have date/time column that looks like:

11:59:50:322 02 10 2015 -0400 EDT
11:11:55:051 16 10 2015 -0400 EDT
00:38:37:106 02 11 2015 -0500 EST
04:15:51:600 14 11 2015 -0500 EST
04:15:51:600 14 11 2015 -0500 EST
13:43:28:540 28 11 2015 -0500 EST
09:24:12:723 14 12 2015 -0500 EST
13:28:12:346 28 12 2015 -0500 EST

How can I read this using python/pandas, so far what I have is this:

pd.to_datetime(pd.Series(df['senseStartTime']),format='%H:%M:%S:%f %d %m %Y %z %Z')

But this is not working, though previously I was able to use the same code for another format (with a different format specifier). Any suggestions?

And_Dev
  • 113
  • 2
  • 12

2 Answers2

1

The issue you're having is likely because versions of Python before 3.2 (I think?) had a lot of trouble with time zones, so your format string might be screwing up on the %z and %Z parts. For example, in Python 2.7:

In [187]: import datetime

In [188]: datetime.datetime.strptime('11:59:50:322 02 10 2015 -0400 EDT', '%H:%M:%S:%f %d %m %Y %z %Z')

ValueError: 'z' is a bad directive in format '%H:%M:%S:%f %d %m %Y %z %Z'

You're using pd.to_datetime instead of datetime.datetime.strptime but the underlying issues are the same, you can refer to this thread for help. What I would suggest is instead of using pd.to_datetime, do something like

In [191]: import dateutil

In [192]: dateutil.parser.parse('11:59:50.322 02 10 2015 -0400')
Out[192]: datetime.datetime(2015, 2, 10, 11, 59, 50, 322000, tzinfo=tzoffset(None, -14400))

It should be pretty simple to chop off the timezone at the end (which is redundant since you have the offset), and change the ":" to "." between the seconds and microseconds.

Community
  • 1
  • 1
Dan
  • 2,647
  • 2
  • 27
  • 37
0

Since datetime.timezone has become available in Python 3.2, you can use %z with .strptime() (see docs). Starting with:

dateparse = lambda x: pd.datetime.strptime(x, '%H:%M:%S:%f %d %m %Y %z %Z')
df = pd.read_csv(path, parse_dates=['time_col'], date_parser=dateparse)

to get:

                           time_col
0  2015-10-02 11:59:50.322000-04:00
1  2015-10-16 11:11:55.051000-04:00
2  2015-11-02 00:38:37.106000-05:00
3  2015-11-14 04:15:51.600000-05:00
4  2015-11-14 04:15:51.600000-05:00
5  2015-11-28 13:43:28.540000-05:00
6  2015-12-14 09:24:12.723000-05:00
7  2015-12-28 13:28:12.346000-05:00
Stefan
  • 41,759
  • 13
  • 76
  • 81
  • I've Python 2.7, would this solution work? Also, I am reading column from csv file, so I need to convert first? – And_Dev Jan 12 '16 at 05:06
  • You can use `.strptime` as `dateparse` function in `read_csv`. Same 3.2 caveat applies, though. If `python3` is not an option, you can still parse the elements as sketched in the other answer. – Stefan Jan 12 '16 at 05:23