Working with historical data in pandas, I encountered this peculiar timezone change, which I want to understand, to handle it correctly.
s = pd.Series(["1901-12-12 10:00:00", "1901-12-13 10:00:00", "1901-12-14 10:00:00", "1901-12-16 10:00:00"],
dtype= "datetime64[ns, America/New_York]")
s
>>>
0 1901-12-12 10:00:00-04:56
1 1901-12-13 10:00:00-04:56 ### 4 minute shift
2 1901-12-14 10:00:00-05:00
3 1901-12-16 10:00:00-05:00
dtype: datetime64[ns, America/New_York]
Researching this, it seems to not be a bug but to have to do with Local Mean Time, which is also represented here:
from pytz import timezone
timezone("America/New_York")
>>>
<DstTzInfo 'America/New_York' LMT-1 day, 19:04:00 STD> ## LMT and 4 minute oddity
Here, you can read that LMT is a system implying a 4 minute shift per degree longitude, which was replaced by standard time, which was replaced by UTC in the 60s.
I am having a hard time finding any real dates relating to the change between LMT and standard time and I am definitely not finding the date 1901-12-14 anywhere.
I understand that different regions may have adopted systems at different times and that historical data can be inconsistent, but any explanation for these system changes and/or why certain dates have been used by pandas, would be greatly appreciated.
My sub-questions:
- Is the timezone shift on 1901-12-14 correct?
- Why was there a timezone shift?
- Are there important pitfalls when handling data around this date in pandas?
Thank you.