0

I have a python function that calls an API that returns a pandas dataframe. Two of the fields are datetime stamps, which pandas converted from strings.

The problem is that the API reports the datetimes in the local time zone. But the data used in the rest of the application is all UTC, so I need to convert the local timestamps to UTC timestamps.

Here is the code:

my_df = get_external_data(begin_time, end_time)
print(my_df.head())
print(my_df.dtypes)

And the example data:

      jobid    name    user               begin                 end
0     16138    bash   noman 2022-12-13 11:33:33 2022-12-13 11:34:21
1     16139    bash   noman 2022-12-13 11:34:22 2022-12-13 11:41:47
2     16140    bash   noman 2022-12-13 11:41:49 2022-12-13 11:43:33
3     16141    bash   noman 2022-12-13 11:49:36 2022-12-13 11:43:33
4     16142    bash   noman 2022-12-13 11:57:08 2022-12-13 11:43:33
jobid                int64
name        string[python]
user        string[python]
begin       datetime64[ns]
end         datetime64[ns]
dtype: object

The program will run in a variety of time zones, so the conversion must be based on the local system's time zone.

How do I convert the timestamps to UTC?

FObersteiner
  • 22,500
  • 8
  • 42
  • 72
Lee Jenkins
  • 2,299
  • 3
  • 24
  • 39
  • You'll need to find out the local time zone of the machine you run this script on. For that, afaik, Python has no reliable way to do it cross-platfrom. You might want to have a look at [Python: Figure out local timezone](https://stackoverflow.com/q/2720319/10197418). – FObersteiner Jun 13 '23 at 21:19

1 Answers1

1
naive_dates = pandas.date_range('2020-01-03','2020-01-05',freq="20T")
localized_dates = naive_dates.tz_localize(-time.timezone)
utc_dates = localized_dates.tz_convert('utc')

so all together

df['begin'].tz_localize(-time.timezone).tz_convert('utc')

Is one way to accomplish this i guess

Joran Beasley
  • 110,522
  • 12
  • 160
  • 179
  • So I don't know why, but `pandas.date_range` does not return the same kind of dataframe that is returned from the API. I tried your "all together" code, and python reported this error: "Error: index is not a valid DatetimeIndex or PeriodIndex". Then I tried adding `print(naive_dates.dtypes)` to your three-liner, and python reported an error: "AttributeError: 'DatetimeIndex' object has no attribute 'dtypes'". – Lee Jenkins Jun 13 '23 at 20:03
  • I don't think this will give correct results. `time.timezone` returns the current UTC offset without DST adjustment. You cannot simply apply this to any given date; you'll have to take into account if DST was active or not - which essentially requires time zone rules. – FObersteiner Jun 13 '23 at 21:16
  • @LeeJenkins you may have to use `df['begin'].dt.tz_localize...` – Joran Beasley Jun 13 '23 at 21:32
  • @FObersteiner good point .... i think if you use a named timezone? (maybe i dont have bandwidth to test) it should properly handle DST ... DST and time are some of my personal biggest painpoints honestly – Joran Beasley Jun 13 '23 at 21:34
  • Exactly, the safe way to go here is to use a named time zone (IANA tz database name). Obtaining that reliably can be a bit tricky, as I commented under the question. – FObersteiner Jun 14 '23 at 06:34