22

I am trying to create a dataframe from csv, and its first column is like

"2013-08-25T00:00:00-0400";
"2013-08-25T01:00:00-0400";
"2013-08-25T02:00:00-0400";
"2013-08-25T03:00:00-0400";
"2013-08-25T04:00:00-0400";

It's datetime with timezone ! I already used something like

df1 = DataFrame(pd.read_csv(PeriodC, sep=';', parse_dates=[0], index_col=0))

but the result was

2013-09-02 04:00:00                                                                                    
2013-09-03 04:00:00                                                                                     
2013-09-04 04:00:00                                                                                     
2013-09-05 04:00:00                                                                                      
2013-09-06 04:00:00                                                                                     
2013-09-07 04:00:00                                                                                     
2013-09-08 04:00:00

Can anyone explain me how to seperate the datetime from timezone ?

palas
  • 221
  • 1
  • 2
  • 3
  • Do you want to process just the datetime component or do you want to take into account the timezone? – EdChum Sep 20 '13 at 08:28

4 Answers4

29

Pandas parser will take into account the timezone information if it's available, and give you a naive Timestamp (naive == no timezone info), but with the timezone offset taken into account.

To keep the timezone information in you DataFrame you should first localize the Timestamps as UTC and then convert them to their timezone (which in this case is Etc/GMT+4):

>>> df = pd.read_csv(PeriodC, sep=';', parse_dates=[0], index_col=0)
>>> df.index[0]
>>> Timestamp('2013-08-25 04:00:00', tz=None)
>>> df.index = df.index.tz_localize('UTC').tz_convert('Etc/GMT+4')
>>> df.index[0]
Timestamp('2013-08-25 00:00:00-0400', tz='Etc/GMT+4')

If you want to completely discard the timezone information, then just specify a date_parser that will split the string and pass only the datetime portion to the parser.

>>> df = pd.read_csv(file, sep=';', parse_dates=[0], index_col=[0]
                     date_parser=lambda x: pd.to_datetime(x.rpartition('-')[0]))
>>> df.index[0]
Timestamp('2013-08-25 00:00:00', tz=None)
Viktor Kerkez
  • 45,070
  • 12
  • 104
  • 85
  • 1
    How comes a time with -4 is GMT+4 ? I thought the signs would be the same. – Elias Hasle May 31 '19 at 11:53
  • Is there no other option? I mean here you have to know in advance what timezone is in the date strings, which seems to defeat the point somewhat. I mean I guess I understand that there is no guarantee that all the dates in the input array will have the same offset, but kind of seems like they commonly would and that pandas could preserve this information. I guess stripping them out first and checking they are all the same manually is possible but just another place for me to make an error that the library could have protected me from. – Ben Farmer Feb 18 '21 at 03:42
0

If you are using pandas, you can try

df['time'] = pd.to_datetime(df['time'])
0

if you don't parse dates upon import of the csv but instead parse them in a second step with pd.to_datetime, you get the expected fixed UTC offset. Ex:

s = pd.Series(["2013-08-25T00:00:00-0400",
               "2013-08-25T01:00:00-0400",
               "2013-08-25T02:00:00-0400",
               "2013-08-25T03:00:00-0400",
               "2013-08-25T04:00:00-0400"])

pd.to_datetime(s)

0   2013-08-25 00:00:00-04:00
1   2013-08-25 01:00:00-04:00
2   2013-08-25 02:00:00-04:00
3   2013-08-25 03:00:00-04:00
4   2013-08-25 04:00:00-04:00
dtype: datetime64[ns, pytz.FixedOffset(-240)]

You can also parse to UTC directly:

pd.to_datetime(s, utc=True)

0   2013-08-25 04:00:00+00:00
1   2013-08-25 05:00:00+00:00
2   2013-08-25 06:00:00+00:00
3   2013-08-25 07:00:00+00:00
4   2013-08-25 08:00:00+00:00
dtype: datetime64[ns, UTC]

(pd.__version__ 1.3.0)

FObersteiner
  • 22,500
  • 8
  • 42
  • 72
0

If you want also to convert all dates to utc (better for storing), you could do :

df = pd.read_csv(file, sep=';', parse_dates=[0], index_col=[0],
                 date_parser=lambda x: pd.to_datetime(x, utc=True),
                 index_col=[0])

So you can select times with:

df.loc["2013-08-25"]
alEx
  • 193
  • 6
  • 12