5

When I use pandas read_csv to read a column with a timezone aware datetime (and specify this column to be the index), pandas converts it to a timezone naive utc DatetimeIndex.

Data in Test.csv:

DateTime,Temperature 2016-07-01T11:05:07+02:00,21.125 2016-07-01T11:05:09+02:00,21.138 2016-07-01T11:05:10+02:00,21.156 2016-07-01T11:05:11+02:00,21.179 2016-07-01T11:05:12+02:00,21.198 2016-07-01T11:05:13+02:00,21.206 2016-07-01T11:05:14+02:00,21.225 2016-07-01T11:05:15+02:00,21.233

Code to read from csv:

In [1]: import pandas as pd

In [2]: df = pd.read_csv('Test.csv', index_col=0, parse_dates=True)

This results in an index that represents the timezone naive utc time:

In [3]: df.index

Out[3]: DatetimeIndex(['2016-07-01 09:05:07', '2016-07-01 09:05:09',
           '2016-07-01 09:05:10', '2016-07-01 09:05:11',
           '2016-07-01 09:05:12', '2016-07-01 09:05:13',
           '2016-07-01 09:05:14', '2016-07-01 09:05:15'],
          dtype='datetime64[ns]', name='DateTime', freq=None)

I tried to use a date_parser function:

In [4]: date_parser = lambda x: pd.to_datetime(x).tz_localize(None)

In [5]: df = pd.read_csv('Test.csv', index_col=0, parse_dates=True, date_parser=date_parser)

This gave the same result.

How can I make read_csv create a DatetimeIndex that is timezone naive and represents the local time instead of the utc time?

I'm using pandas 0.18.1.

Puggie
  • 3,867
  • 2
  • 35
  • 39
  • Did you look at [this](http://stackoverflow.com/questions/16628819/convert-pandas-timezone-aware-datetimeindex-to-naive-timestamp-but-in-certain-t?rq=1)? – Scratch'N'Purr Jul 22 '16 at 16:51
  • Yes, I did. But that specific question is about dealing with timezone information that is actually there. In my question, pandas.read_csv loses the timezone information by converting the datetime read from the csv to a timezone naive utc datetime. – Puggie Jul 25 '16 at 08:21

4 Answers4

4

According to the docs the default date_parser uses dateutil.parser.parser. According to the docs for that function, the default is to ignore timezones. So if you supply dateutil.parser.parser as the date_parser kwarg, timezones are not converted.

import dateutil

df = pd.read_csv('Test.csv', index_col=0, parse_dates=True, date_parser=dateutil.parser.parse)

print(df)

outputs

                           Temperature
DateTime                              
2016-07-01 11:05:07+02:00       21.125
2016-07-01 11:05:09+02:00       21.138
2016-07-01 11:05:10+02:00       21.156
2016-07-01 11:05:11+02:00       21.179
2016-07-01 11:05:12+02:00       21.198
2016-07-01 11:05:13+02:00       21.206
2016-07-01 11:05:14+02:00       21.225
2016-07-01 11:05:15+02:00       21.233
Alex
  • 18,484
  • 8
  • 60
  • 80
  • This answer is certainly helpful, but leads to a timezone aware DatetimeIndex. Adding the argument `ignoretz=True` in the parsing function gives the desired timezone naive DatetimeIndex. I added this as a new answer to the question. – Puggie Jul 25 '16 at 10:19
4

The answer of Alex leads to a timezone aware DatetimeIndex. To get a timezone naive local DatetimeIndex, as asked by the OP, inform dateutil.parser.parser to ignore the timezone information by setting ignoretz=True:

import dateutil

date_parser = lambda x: dateutil.parser.parse(x, ignoretz=True)
df = pd.read_csv('Test.csv', index_col=0, parse_dates=True, date_parser=date_parser)

print(df)

outputs

                     Temperature
DateTime                        
2016-07-01 11:05:07       21.125
2016-07-01 11:05:09       21.138
2016-07-01 11:05:10       21.156
2016-07-01 11:05:11       21.179
2016-07-01 11:05:12       21.198
2016-07-01 11:05:13       21.206
2016-07-01 11:05:14       21.225
2016-07-01 11:05:15       21.233
Community
  • 1
  • 1
Puggie
  • 3,867
  • 2
  • 35
  • 39
1

I adopted the dateutil technique earlier today but have since switched to a faster alternative:

date_parser = lambda ts: pd.to_datetime([s[:-5] for s in ts]))

Edit: s[:-5] is correct (screenshot has error)

In the screenshot below, I import ~55MB of tab-separated files. The dateutil method works, but takes orders of magnitude longer.

enter image description here

This was using pandas 0.18.1 and dateutil 2.5.3.


EDIT This lambda function will work even if Z-0000 suffix is missing...

date_parser = lambda ts: pd.to_datetime([s[:-5] if 'Z' in s else s for s in ts])
patricktokeeffe
  • 1,058
  • 1
  • 11
  • 21
-1

You can even try :

date_parser = lambda x : pd.to_datetime(x.str[:-6])
Kalana
  • 5,631
  • 7
  • 30
  • 51
  • 1
    A [virtually identical solution](https://stackoverflow.com/a/39156778/11301900) was already posted 4 years ago. – AMC Apr 11 '20 at 01:03