0

I need to operate on dates within my pandas dataframe but my code has a bug. Specifically I am importing a column of timestamps from a csv file.

x['Created at']
0       2016-05-13 13:28:41 -0400
1       2016-05-13 05:11:18 -0400
3       2016-05-12 18:06:42 -0400
4       2016-05-12 16:06:24 -0400
5       2016-05-12 13:58:01 -0400
6       2016-05-12 03:30:27 -0400

I am then changing this data into a datetime. I am doing this via pandas.to_datetime(df['date']) but when I do this, the time is getting shifted by 4 hours.

x.Createdat
0      2016-05-13 17:28:41
1      2016-05-13 09:11:18
3      2016-05-12 22:06:42
4      2016-05-12 20:06:24
5      2016-05-12 17:58:01
6      2016-05-12 07:30:27

I am assuming this is because of the -0400 at the end of the timestamp but I can not figure out the best way to resolve this issue so I can aggregate this data in my own timezone.

1 Answers1

1

If the -400 is information that you do not need or want, then simply change your use of pandas.to_datetime(df['date']) to pandas.to_datetime(df['date'].apply(lambda x: x[:-6]) which will drop the -400 from the string. Not the best and most robust approach, but it will work.

If you want to use the -400 but you want to convert it to a different timezone, check out tz_localize as described in this answer: convert gmt to local timezone in pandas

Another tool that should help is using pytz: pytz - Converting UTC and timezone to local time

Community
  • 1
  • 1
Projski
  • 181
  • 8