1

I have the following dataframe, with some offset time.

      feed_id                created_at
7191  1009408 2019-03-10 01:55:35-05:00
7192  1009408 2019-03-10 01:56:35-05:00
7193  1009408 2019-03-10 01:57:36-05:00
7194  1009408 2019-03-10 01:58:38-05:00
7195  1009408 2019-03-10 01:59:38-05:00
7196  1009408 2019-03-10 03:00:39-04:00
7197  1009408 2019-03-10 03:01:40-04:00
7198  1009408 2019-03-10 03:02:41-04:00
7199  1009408 2019-03-10 03:03:42-04:00

I got here by using:

mydf['created_at']=mydf['created_at'].dt.tz_localize('UTC').dt.tz_convert('America/Montreal')

I want the offset time to be added to the timestamp, if a date looks like 2019-03-10 01:55:35-05:00, I want it to look like 2019-03-09 20:55:35.

I saw a suggestion on this post that thought would help me but the offset being added is constant and I want to consider DST for this exercise.

Is there a way to get an "timezone-aware" timedelta or offset that takes DST into consideration?

Or is there a simpler way to achieve this?

Aquiles Páez
  • 503
  • 6
  • 18

2 Answers2

1

I had this same issue and found the answer here. You just need .dt.tz_localize(None) at the end:

mydf['created_at'] = (mydf['created_at'].dt.tz_localize('UTC')
                                        .dt.tz_convert('America/Montreal')
                                        .dt.tz_localize(None))
clued__init__
  • 181
  • 2
  • 5
0

Here is one way, the strategy being to dissect the Datetime.datetime objects in created_at and reconstitute. There are probably better ways, especially one that does not go after tzoffset._offset.

import pandas as pd
from pandas.compat import StringIO

csvdata = StringIO("""id,feed_id,created_at
7191,1009408,2019-03-10 01:55:35-05:00
7192,1009408,2019-03-10 01:56:35-05:00
7193,1009408,2019-03-10 01:57:36-05:00
7194,1009408,2019-03-10 01:58:38-05:00
7195,1009408,2019-03-10 01:59:38-05:00
7196,1009408,2019-03-10 03:00:39-04:00
7197,1009408,2019-03-10 03:01:40-04:00
7198,1009408,2019-03-10 03:02:41-04:00
7199,1009408,2019-03-10 03:03:42-04:00""")

df = pd.read_csv(csvdata, sep=",", index_col="id", parse_dates=True, infer_datetime_format=True)
df['created_at'] = pd.to_datetime(df['created_at'])
df['offset'] = df['created_at'].apply(lambda x: x.tzinfo)
df['naive'] = df['created_at'].apply(lambda x: x.replace(tzinfo=None))
# tzoffset._offset is a timedelta
df['naive and offset'] = df['naive'] + df['offset'].apply(lambda x: x._offset)
print(df)
Rich Andrews
  • 1,590
  • 8
  • 12