1

I am reading data from a Java source. I end up with the following dataframe:

df.head()

    open        timestamp
0   1.13550     2019-02-24T17:00-06:00[US/Central]
1   1.13570     2019-02-24T17:05-06:00[US/Central]
2   1.13560     2019-02-24T17:10-06:00[US/Central]
3   1.13565     2019-02-24T17:15-06:00[US/Central]
4   1.13570     2019-02-24T17:20-06:00[US/Central]

df.dtypes

open        float64
timestamp   object
dtype: object

How can I convert column timestamp to Datetime with timezone in Pandas? Is there such thing in Pandas?

I found this post but it does not seem to parse timezone, just add a timezone later. How to read datetime with timezone in pandas

Any help/hint is welcomed

M.E.
  • 4,955
  • 4
  • 49
  • 128

2 Answers2

1

You can try remove the [...] part, then pass it to to_datetime:

pd.to_datetime(df.timestamp.str.extract('(.*)\[.*\]')[0])

returns:

0   2019-02-24 17:00:00-06:00
1   2019-02-24 17:05:00-06:00
2   2019-02-24 17:10:00-06:00
3   2019-02-24 17:15:00-06:00
4   2019-02-24 17:20:00-06:00
Name: 0, dtype: datetime64[ns, pytz.FixedOffset(-360)]

You can keep the timezone label by adding one more capture group in the regex pattern:

pattern = '(?P<time>.*)\[(?P<zone>.*)\]'
new_df = df.timestamp.str.extract(pattern)

Then new_df is:

                     time        zone
0  2019-02-24T17:00-06:00  US/Central
1  2019-02-24T17:05-06:00  US/Central
2  2019-02-24T17:10-06:00  US/Central
3  2019-02-24T17:15-06:00  US/Central
4  2019-02-24T17:20-06:00  US/Central

which you can convert time by pd.to_datetime.

Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
  • Same comment than with WeNYoBen's answer. Isn't timezone being lost by doing that? -06:00 does not uniquely define a timezone, it just define a GMT offset. – M.E. Jul 10 '19 at 23:02
  • You didn't mention that you want to keep the time zone. But that's easy to fix. – Quang Hoang Jul 10 '19 at 23:03
  • thanks, I was actually curious if the timezone can be embbeded as part of the datetime object. Same as in Python (example: https://stackoverflow.com/questions/4530069/python-how-to-get-a-value-of-datetime-today-that-is-timezone-aware) – M.E. Jul 10 '19 at 23:20
1

One way

pd.to_datetime(df.timestamp.str.split('[').str[0])
Out[137]: 
0   2019-02-24 17:00:00-06:00
1   2019-02-24 17:05:00-06:00
2   2019-02-24 17:10:00-06:00
3   2019-02-24 17:15:00-06:00
4   2019-02-24 17:20:00-06:00
Name: timestamp, dtype: datetime64[ns, pytz.FixedOffset(-360)]
BENY
  • 317,841
  • 20
  • 164
  • 234
  • Isn't timezone being lost by doing that? -06:00 does not uniquely define a timezone, it just define a GMT offset. – M.E. Jul 10 '19 at 23:02