3

I'm currently working on an application that has the unfortunate requirement of working on complex localized dates and times.

As a simple example, if an event were t happen "today" in Singapore, this is fairly easy to represent: we store the date in UTC, the IANA timezone Asia/Singapore, perhaps also the effective UTC offset at the given timestamp (e.g. +08:00) so as to not have to consult the IANA database every time we render them.

If you aren't familiar, dealing with timezones is absolutely insane. We can't just assume that Singapore is always +08:00:

  1. Daylight savings time may or may not happen, and different locales start and end DST on different calendar days, and some locales may offset by more than or less than one hour.
  2. Over time, DST and the actual UTC offset can change. Made-up examples:
    • As a made-up example, in 1971, the start and end dates for DST changed to March 31st and October 1st, respectively, rather than February 27th and September 16th.
    • In 1933, the DST offset was changed from one hour to one hour and thirty minutes.
    • Yes, these things do absolutely happen and the IANA database covers them on a per-timezone-locale basis, which is why we need to store the UTC offset for the given datetime and the relevant timezone identifier.
  3. Where things get even worse is when actual calendars in use have changed over time, locales adopted the Gregorian/ISO calendar at different times, and as a result, they had to skip up to a couple weeks of days during the transition.
    • An actual example is in the Soviet Union in 1918: prior to January 31st, 1918, Russia and other nations used the Julian calendar, which was off from the Gregorian calendar by fourteen days, so a train ride from Russia to Europe that only takes a few hours can result in the current date moving forward by two weeks. When it actually was changed, January 31st was the last day in the Julian calendar and the very next day was February 14th in the Gregorian calendar.
    • In notation, dates from before this transition were accompanied by a specifier of old-style (O.S.) or new-style (N.S.) dates.

Therefore, in order to properly represent dates before/after these transitions, we must store:

  1. The datetime as an RFC-3339 timestamp in UTC in the Gregorian/ISO calendar.
  2. The named timezone closest to where the specific date was relevant, e.g. Asia/Singapore: this means that we also have to collect a location with the date that we can hopefully use to select one of these named timezones.
  3. The UTC offset for the specific datetime according to the named timezone, e.g. +08:00.
  4. An optional other calendar (which I'll call a calendar "projection") which was in use in the given locale during the datetime in question, so that dates can be represented in both calendars to further clarify things and to provide better accuracy.

As noted above, the IANA database does provide a complex database for each timezone to keep a historically accurate timeline of UTC offset changes and DST changes. In Java and other programming languages, datetime libraries use this database to perform conversions between UTC and a local time by the named timezone.

What I need, however, is hopefully a similar database which can be used to know which calendar was in use for a given timezone or locale so that I can offer a "projection" to the calendar locally in use. I could write my own system for this, incorporating data that I'm able to use to offer these projections, but as everyone knows time is very hard and I do not want to engage in a historical study of calendars to make my own set of rules.

Another problem seems to be finding the right named timezone for a given general geographical location. During and after wars, different geographical locations changed hands, became their own countries, etc. In 1917, the capital of Russia was Petrograd (subsequently Leningrad, subsequently St. Petersburg), but at some point this changed to Moscow. If I have a given general geographic area (e.g. "Kiev" or "Ukraine"), I'll need to try to associate that city with a named timezone somehow, and how do I do that? Do I do a geographical search for the nearest named timezone to an arbitrary city that is within the same latitude?

In summary:

  • Does an IANA database exist which tracks when different calendars were in use for a geographical area?
  • If I have a geograpical area for a given city or country, how can I figure out which named timezone to use for it?
Naftuli Kay
  • 87,710
  • 93
  • 269
  • 411
  • You probably saw this thread already. https://stackoverflow.com/a/16086964/19808673 – user19808673 Sep 26 '22 at 23:22
  • 2
    Seeking such precision with older historical data is futile, in my experience. But you may find some of what you're looking for on Wikipedia and sources linked from there. https://en.wikipedia.org/wiki/Adoption_of_the_Gregorian_calendar. https://en.wikipedia.org/wiki/List_of_adoption_dates_of_the_Gregorian_calendar_by_country – Matt Johnson-Pint Sep 26 '22 at 23:45

1 Answers1

2

A reminder to the reader about definitions:

  • An offset from UTC is merely a number of hours-minutes-seconds ahead or behind UTC. Modern protocols usually refer to positive numbers being ahead of UTC and negative numbers being behind UTC. But some protocols do the opposite, so beware.
  • A time zone is much more. A time zone is a named history of the past, present, and future changes to the offset used by the people of a particular region as decided by their politicians.

so as to not have to consult the IANA database every time we render them.

Beware: Politicians change the time zone rules. They do so with surprising frequency, and even more surprisingly little forewarning. This happens across cultures and continents, where a wide array of politicians have shown a penchant for twiddling with the time zone rules.

I recommend against pre-calculating an offset from UTC. I recommend storing a moment, a specific point on the timeline, in UTC (an offset from UTC of zero hours-minutes-seconds). For presentation to the user, or where business logic demands, dynamically adjust into a time zone.

If you aren't familiar, dealing with timezones is absolutely insane.

Not really “insane”, but yes, amazingly tricky, counter-intuitive, and error-prone.

We can't just assume that Singapore is always +08:00

No, you cannot. As I said above, offsets and time zones are political time, defined by fickle politicians.

Daylight savings time may or may not happen, and different locales start and end DST on different calendar days, and some locales may offset by more than or less than one hour.

Yes, politicians often alter the start & stop dates for Daylight Saving Time (DST).

Yes, these things do absolutely happen

Yes, politicians invent all kinds of adjustments, sometimes quite wacky and senseless.

The newest fad is going onto DST and never stopping, an everlasting DST. So then never again will the sun be directly overhead at noon — defying the very definition of noon.

the DST offset was changed from one hour to one hour and thirty minutes.

Politicians are free to change the current offset within their time zone(s) by any amount they fancy, any amount of hours-minutes-seconds.

locales adopted the Gregorian/ISO calendar at different times

Avoid the word “locale” when talking about date-time handling. That word has a specific meaning in localization work. And many developers mistakenly think locale and time zones are related, but they are not. Time zones are tied to legal jurisdictions under the control of specific politicians; locales are not.

Do not conflate Gregorian calendar with ISO 8601 calendar. For example, ISO 8601 mandates that weeks start on a Monday. Various Gregorian calendar implementations may use a different day-of-week. And furthermore, this means different week numbers under each calendar system.

The fact that various time zones adopted calendar systems at different points is not a big problem as far as I know. Those changes are accounted for in the IANA database, also known as the tz database, formerly known as the Olson Database. But I may be wrong about that, as I do date-time handling only for contemporary times.

Beware: The tz database is generally expected to be accurate only from about 1970. And even within these recent decades some errors and omissions have happened.

Amazingly, government bureaucracies and academic historians neglected to gather a complete record of time zone changes. Only in the last few decades has an organized record been cobbled together.

datetime as an RFC-3339 timestamp

I recommend you avoid RFC 3339. That document is but a self-declared “profile” of the ISO 8601 standard. But RFC 3339 purposely breaks some elements of ISO 8601.

For example, one of those breaking rules is to allow a negative offset of zero, allowing -00:00 in addition to +00:00. Their logic escapes me. Under ISO 8601, a -00:00 is forbidden.

Stick with ISO 8601.

The named timezone closest to where the specific date was relevant, e.g. Asia/Singapore: this means that we also have to collect a location

No, location is not relevant. Intention and context is relevant, legal jurisdiction is relevant. Location does not necessarily indicate the relevant time zone.

For example, two business people sitting in Paris (Europe/Paris time zone) could be signing a contract whose legal terms are defined in Canada using America/Edmonton time zone. Two different dates could be in play, with “tomorrow” in Europe being simultaneously “yesterday” in the Americas.

Another problem seems to be finding the right named timezone for a given general geographical location

Again, time zones are decided by politicians, not by geography. Where local politicians have defined a different offset that that of the greater area/jurisdiction, a sub-divided time zone name is created. See List of tz database time zones in Wikipedia (not necessarily updated to the most recent info, by the way). Take for example, just the state of Indiana in the United States:

  • America/Indiana/Indianapolis
  • America/Indiana/Knox
  • America/Indiana/Marengo
  • America/Indiana/Petersburg
  • America/Indiana/Tell_City
  • America/Indiana/Vevay
  • America/Indiana/Vincennes
  • America/Indiana/Winamac

Does an IANA database exist which tracks when different calendars were in use for a geographical area?

None that I have heard of.

If I have a geograpical area for a given city or country, how can I figure out which named timezone to use for it?

I would be surprised if there were any such lookup. As I have tried to explain, time zones are jurisdictional, not geographical. For any point in space (geography), you would first have to determine what jurisdiction had control at the moment of interest to you. And then you need to map that jurisdiction to a time zone name, a mapping I have never seen published.

Lastly, I wonder if your question is really moot with regard to moments so far past in history. Do you really care about careful adjustment between time zones for a moment in 1917 Russia? I acknowledge that time zone adjustment can be vitally important in contemporary times, such as determining when a contract was signed and went into effect, as I alluded above. But for moments so far past, I cannot imagine the practical usefulness.

And as I said, we have a decent time zone history only since 1970, and even that just barely.

Basil Bourque
  • 303,325
  • 100
  • 852
  • 1,154
  • RFC 3339's `-00:00` and ISO 8601's absence of offset are not the same thing. `-00:00` is still numerically equivalent to `+00:00`, it just means that the *prefered* reference is local time but that offset is unknown. – Matt Johnson-Pint Sep 26 '22 at 23:39
  • @MattJohnson-Pint No, you are incorrect on that. RFC 3339 say *explicitly* the opposite of your claim, with the RFC saying that -00:00 differs semantically from +00:00. To quote Section 4.3 of RFC 3339: *If the time in UTC is known, but the offset to local time is unknown, this can be represented with an offset of "-00:00". This differs semantically from an offset of "Z" or "+00:00", which imply that UTC is the preferred reference point for the specified time.* – Basil Bourque Sep 26 '22 at 23:44
  • Semantically, but not numerically. If I leave an offset off completely, then by ISO 8601 that means the date and time values are in local time and I don't know the UTC equivalent. If I have an offset of `-00:00` then by RFC 3339, the date and time values are in UTC and I don't know the local time equivalent. – Matt Johnson-Pint Sep 26 '22 at 23:48
  • Otherwise it wouldn't say "If the time in UTC is known" – Matt Johnson-Pint Sep 26 '22 at 23:50
  • Also it goes on to say, "RFC2822 [IMAIL-UPDATE] describes a similar convention for email.". [There](https://www.rfc-editor.org/rfc/rfc2822#section-3.3) it says "Though "-0000" also indicates Universal Time..." – Matt Johnson-Pint Sep 26 '22 at 23:53
  • If you are correct (and I am starting to see that you may be), what does it even mean to say "the offset to local time is unknown" as *any* date-time with an offset from UTC of zero has no other offset known? That section of the RFC would be even crazier than my own possibly incorrect interpretation. In other words, when would a value with +00:00 ever be different than one with a -00:00? – Basil Bourque Sep 26 '22 at 23:53
  • I think it's more about distinguishing between time zones that have a local time offset `+00:00` themselves (ex, Reykjavík, or London in winter) vs `-00:00` being *only* UTC. Still, I don't see the need, as we get that with `Z`. So I agree with you - it's a bit useless. Some of this is also akin to GMT and UTC being *semantically* different, but logically identical. – Matt Johnson-Pint Sep 27 '22 at 00:34
  • Excellent points, thank you. I actually absolutely do have to account for the transition from Julian to Gregorian in Russia after the revolution, as this historical analysis tool will be used in relation to events such as these. If I didn't have to deal with this, I would absolutely not need to deal with calendars and such, but for this I _must_ be able to take "fuzzy" (eg not specific even to the second, minute, hour, day, or month) dates, and represent the same actual time period in Russia and in Europe. – Naftuli Kay Sep 27 '22 at 02:04
  • So I must be able to take dates in the Russian Julian calendar and correspond to these with dates in the European Gregorian calendar. I'll be storing all dates in UTC, but will also store the moment-in-time UTC offset for the specific location at that moment in time, and which calendar to "project" these dates to when displaying to users, so I can show "January 1st, 1918 (Julian), January 15th, 1918 (Gregorian)" to users rather than just the Gregorian date. – Naftuli Kay Sep 27 '22 at 02:07
  • As to your point about things being redrawn all the time by politicians, is this still the case _retroactively_? I'm not familiar with whether it actually occurs that things like DST is retroactively changed in a location, and since time periods are so far in the past, I'm not sure someone would come along and rewrite history by declaring that "DST during 1926 in Berlin USED TO BE X but NOW IS Y." – Naftuli Kay Sep 27 '22 at 02:10
  • And thanks for the ISO clarification via Gregorian for weeks. I am familiar that during the USSR they experimented with different week lengths, but thankfully I don't think I need to deal with that in my case. – Naftuli Kay Sep 27 '22 at 02:11
  • 1
    @NaftuliKay (a) No, I am not aware of retroactive redefinitions of time zones. It might be possible, such as a military occupation changing the offset of the occupied region and perhaps the locals later not wanting to recognize that change — just a guess on my part, I don't know of any such case. What I do know is a problem is **time zone changes not having been recorded** in decades and centuries prior to 1970. So while the historical facts may not be changing, those historical facts may only be added to the tzdb *later*, and some may still be missing or inaccurate now. – Basil Bourque Sep 27 '22 at 02:17
  • @NaftuliKay (b) Other week structures were tried [at the Eastman Kodak Company](https://en.wikipedia.org/wiki/International_Fixed_Calendar), and [by the French Revolution](https://en.wikipedia.org/wiki/French_Republican_calendar). You can find a Java implementation of the Eastman calendar, a.k.a. International Fixed Calendar, in the [`InternationalFixedChronology`](https://www.threeten.org/threeten-extra/apidocs/org.threeten.extra/org/threeten/extra/chrono/InternationalFixedChronology.html), part of [*ThreeTen-Extra*](https://www.threeten.org/threeten-extra/) library. – Basil Bourque Sep 27 '22 at 02:20
  • @BasilBourque as for storing time offsets in my database, I am not clear on why not to do this for historical data. I can't count on clients having the tzdb per se, though I suppose there is some way I could do this in JavaScript. Is the suggestion that I don't store the offset for the historical date and render it server-side every time it is displayed so as to be able to reflect future changes to the tzdb? – Naftuli Kay Sep 27 '22 at 02:31
  • @NaftuliKay Your second paragraph talks about *today* with regard to storing the offset-adjusted value. So my comment about recording only UTC was for current and future moments. For past moments, yes you could reasonably store the UTC *and* the offset-adjusted value (and thirdly, the time zone) *if* you are certain your time zone data is correct, reliable, and unchanging. – Basil Bourque Sep 27 '22 at 02:37
  • 1
    @NaftuliKay By the way, note that the *java.time* framework, in its [`ZonedDateTime#toString`](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/time/ZonedDateTime.html#toString()) method, has wisely extended the ISO 8601 format to append the name of the time zone in square brackets. Example: `2007-12-03T10:15:30+01:00[Europe/Paris]`. That format might be helpful in your work. – Basil Bourque Sep 27 '22 at 02:38
  • @BasilBourque I will likely be writing this in Rust, and there is thankfully the [`chrono_tz`](https://docs.rs/chrono-tz/0.6.3/chrono_tz/) crate, which builds the tzdb at compile-time. As for `java.time`, I have actually worked with it before but won't be in this case. I appreciate you sharing the format used, and I might utilize this as well, as my "fuzzy" date times (FDTs) will likely be serialized to and deserialized from simple strings to save transfer cost, as opposed to a bigger JSON object. I'll be writing a parser in [`nom`](https://docs.rs/nom/latest/nom/) most likely. – Naftuli Kay Sep 27 '22 at 18:38