0

Suppose we have a dataset with a UNIX timestamp in milliseconds:

data = [
    {
      "unix_ts": 1669291200000,
      "val": 10
    },
    {
      "unix_ts": 1669291260000,
      "val": 25
    }
  ]

Which we convert to a Pandas dataframe with a Pandas timestamp (datetime) set to US/Eastern:

df = pd.DataFrame(data)
df['ET'] = pd.to_datetime(df['unix_ts'], unit='ms').dt.tz_localize('UTC').dt.tz_convert('US/Eastern')
    unix_ts         val ET
0   1669291200000   10  2022-11-24 07:00:00-05:00
1   1669291260000   25  2022-11-24 07:01:00-05:00

We can see that the ET time is 5 hours behind the UTC unix_ts

Suppose we want a new integer column with a value that corresponds with that -5 hours difference. Naively, we could do this:

df['adjusted_ts'] = df['unix_ts'] - (3600000 * 5)
# Include column to allow us to check the result by eye.
df['Check_ET'] = pd.to_datetime(df['adjusted_ts'], unit='ms').dt.tz_localize('UTC').dt.tz_convert('US/Eastern')

Result:

    unix_ts         val ET                          adjusted_ts           Check_ET
0   1669291200000   10  2022-11-24 07:00:00-05:00   1669273200000           2022-11-24 02:00:00-05:00
1   1669291260000   25  2022-11-24 07:01:00-05:00   1669273260000           2022-11-24 02:01:00-05:00

We can see from the Check_ET column that this "works", but it won't when we get to the part of the year when US/Eastern is only 4 hours behind UTC.

It would be handy to be able to do something like this:

import numpy as np
df['smart_adjusted_ts'] = (df['ET'].astype(np.int64) // 10 ** 9) * 1000

But, sadly, that's not so 'smart', as it results in a column that's identical to the original unix_ts (Regardless of the ET column's timezone, the underlying data (NumPy) is always stored as nanoseconds since the EPOCH in UTC.):

    unix_ts         ... Check_ET                    smart_adjusted_ts
0   1669291200000   ... 2022-11-24 02:00:00-05:00   1669291200000
1   1669291260000   ... 2022-11-24 02:01:00-05:00   1669291260000

So, unless there is a special method to do this (Anyone?), my only thought is to go back to the original approach but dynamically extract the UTC offset (ideally as ints: 4 or 5) from the ET column.

The problem is, I can't find how to do that either, but I'm hoping to achieve something like this:

df['adjusted_ts'] = df['unix_ts'] - (3600000 * et_utc_abs_diff)

Please be aware that a dataset could include dates with both (4 & 5 hour) differences, so it's important to get this difference on a row-by-row basis as opposed to having a master variable set to 4 or 5.

Any ideas for an elegant solution, please?

Edit

I came up with the following, and it gets the right result, but I suspect there must be a better way using standard Pandas methods.

df['adjusted_ts'] = df['unix_ts'] - (df['ET'].astype(str).str.slice(start=21, stop=22).astype(int) * 3600000)
HapiDaze
  • 335
  • 6
  • 16
  • [UNIX time](https://en.m.wikipedia.org/wiki/Unix_time) refers to UTC - ***do not roll your own version*** if you want to avoid a lot of confusion down the road. – FObersteiner Nov 25 '22 at 18:33
  • I'm not attempting to roll my own. Another application displays a chart, and unfortunately it simply takes integers and generates datetimes in UTC. I need to provide it integer values that reflect times 4/5 hours behind, or it's going to be very confusing for users. Those columns can be called 'NOT_UNIX_TS' for all I care, but they do need to have values that are 14400000 or 18000000 behind UNIX time. – HapiDaze Nov 25 '22 at 18:54
  • Can you use the date/time representation in the chart, I mean the ISO8601 string for example? I'd prefer that for displaying data to users anyway (although my "users" are mostly scientists that are happy with UTC). If that is absolutely no option, you might [localizes the datetime Series to None](https://stackoverflow.com/a/62656878/10197418), then convert back to (now fake) Unix time. – FObersteiner Nov 25 '22 at 19:25
  • The chart is a bit of a black-box, and I'm trying to leave it alone. It has a UTC "offset" setting, but it's crude and dependent upon the timezone of the computer it is being run on. Users will change that from time-to-time, as they travel. Generating custom datetime strings would be nice, but it creates a lot of extra work and processing overhead - a lot of data is coming in fast. Thank you for the link. I'll see if I can get a solution, from that, and report back. – HapiDaze Nov 25 '22 at 19:38

1 Answers1

1

Here's a way to implement this by localizing to None, as I've described in the comments.

import pandas as pd

df = pd.DataFrame({"unix_ts": [1651363200000, 1669291260000],
                   "val": [10, 25]})

df["ET"] = pd.to_datetime(df["unix_ts"], unit='ms', utc=True).dt.tz_convert("America/New_York")
# df["ET"]
# 0   2022-04-30 20:00:00-04:00
# 1   2022-11-24 07:01:00-05:00
# Name: ET, dtype: datetime64[ns, America/New_York]

# we can remove the time zone to get naive datetime. pandas will treat this as UTC
df["ET_naive"] = df["ET"].dt.tz_localize(None)
# df
#          unix_ts  val                        ET            ET_naive
# 0  1669291200000   10 2022-11-24 07:00:00-05:00 2022-11-24 07:00:00
# 1  1669291260000   25 2022-11-24 07:01:00-05:00 2022-11-24 07:01:00

# now we can convert back to units of time since the epoch, 
# only that the epoch is now eastern time:
df["ET_epochtime"] = df["ET_naive"].astype("int64") / 1e6 # division gives milliseconds

# df["ET_epochtime"] correctly accounts for DST offset:
(df["unix_ts"]-df["ET_epochtime"])/3600000
# 0    4.0
# 1    5.0
# dtype: float64
FObersteiner
  • 22,500
  • 8
  • 42
  • 72
  • I'm still unclear on how this gets me to integer values that are 4 or 5 hours behind the row's UTC time, depending on daylight saving. I effectively need a column: 'utc_ts_offset_hours_before' where we know that 'offset' represents 4 or 5, depending on DST, holding integer values 14400000 or 18000000 ms behind the row's UNIX timestamp. This is why I am hoping there is a built-in way to extract the hours offset value from column: 'ET'. My updated "solution" does this, but it's convoluted. Am I missing how to get this via your suggested solution? – HapiDaze Nov 27 '22 at 16:22
  • 1
    @HapiDaze not sure I understand; the code I posted above correctly accounts for DST active/inactive. I've updated the answer to make that clear. – FObersteiner Nov 28 '22 at 08:16