6

I have a question concerning XML Schema's built-in type xsd:dateTime.

What are the exact semantics of xsd:dateTime without a timezone? Ex. 1970-01-01T00:00:00.

I've read through a number of XML Schema spec documents but could not find out how should it be processed.

Specifically, I want to understand how to convert xsd:dateTime to the Date (like java.util.Date or JavaScript Date) object correctly.

Side note: I am perfectly aware of Java util classes like DatatypeConverter or DatatypeFactory, I would like to find the XML Schema spec that defines how to do this conversion.

The problem with the Date class (in Java as well in JavaScript) is that these classes do have timezones (defaulted to the local time zone). If I'm getting a xsd:dateTime without time zone on input then I have to deside somehow, which time zone I should assume. Otherwise I just can't convert it to a timezoned value (like Date).

Now the question is, what should I assume. I see following options here:

  • Assume something default like UTC.
  • Assume local timezone of the processor.

I don't really like the second option. It is entirely random! On my machine, if I run

System.out.println(DATATYPE_FACTORY
    .newXMLGregorianCalendar("1970-01-01T00:00:00")
    .toGregorianCalendar().getTime().getTime());

I'll get -3600000, 0, 3600000 for GMT+1, GMT or GMT-1 (and even more variants depending on summer time. This is so arbitrary, I'm really not getting this. Does this mean than when we have an XML document with an element like

<date-time>1970-01-01T00:00:00</date-time>

we have actually no idea, which exactly time instant was meant?

The first option (assuming UTC) seems more valid to me but this is apparently not what (at least) Java tools are doing.

So could please someone give me a pointer to a spec of some kind defining semantics of the timezoneless xsd:dateTime?

Thank you.

Update:

Current findings are:

  • Unspecified time zone has exactly the semantics of the "unspecified" time zone, that is, you can't blindly assume UTC or local time zone of the processor or whatever. This is some local time zone, but which one - you don't really know.
  • This basically means that strictly speaking you can't convert xsd:dateTime to Date object which has a specific time zone - UNLESS an assumption about the absent time zone is somehow made.
  • As a tool provider, I can't really make an informed assumption of that kind. I have no background on data or its semantics.
  • This leads me to the conclusion that the tool user has to provide such assumption - either explicitly or implicitly.

My solution will be as follows:

  • In my library I have a so-called context object which provides the XML procession context (analog of JAXB JAXBContext). I will extend this object with a methods like getDefaultTimezoneOffset() and setDefaultTimezoneOffset(int timezoneOffset)
  • By default, this method will return some default value. I prefer 0 (UTC) at the moment. However can be local time zone (like Java tools do) as well.
  • Library user is welcome to provide a different default time zone offset, but it is not strictly required ("implicit" assumption here)
  • When parsing xsd:dateTime to Date, if incoming value is missing a time zone, it will be assumed to be the context.getDefaultTimezoneOffset().
  • I will also note the incoming time zone (or lack thereof) in the parsed Date object. For instance in a property like originalTimezoneOffset or something like that. This will not modify the value of the Date object but will provide some additional context information (for instance when the value should be printed again).
  • When printing a Date, the library would check for the originalTimezoneOffset and if it is provided consider it when rendering the lexical value.
lexicore
  • 42,748
  • 17
  • 132
  • 221
  • possible duplicate of [What is the default time zone for an XML Schema dateTime if not specified?](http://stackoverflow.com/questions/20670041/what-is-the-default-time-zone-for-an-xml-schema-datetime-if-not-specified) – Matt Johnson-Pint Mar 20 '14 at 00:57

2 Answers2

5

Basically the timezone is absent information, and there are many ways of interpreting absent information; in the end it's up to you. Possible interpretations are:

  • the timezone is unknown

  • the timezone can be established from the context, e.g. an associated place

  • the timezone is UTC

The XPath/XQuery/XSLT family of specifications assume a context-defined timezone. The context here could be the locale of the user, or the timezone of the machine on which the software is running, or any number of other things.

In a sense it's no different from omitting the time and giving only a date. What exactly do you mean when you say you were born on 21 March 1973? What timezone are you talking about? The assumption is probably that you've left out the information because no-one is likely to care.

Michael Kay
  • 156,231
  • 11
  • 92
  • 164
  • I definitely understand the notion of "unknown" time zone. My problem is that when converting to Date, I am forced to have a time zone, period. So the question is - which assumtion should I make - and can I even make one. From all what I learn till now it seems that, as a tool provider, I can't make an assumption here, I have no background on the incoming data. This leads me to the conclusion that I have to leave this assumption to the user of the library. – lexicore Mar 20 '14 at 08:08
  • ps. Personal side-note: I appreciate you answering here, Michael. I'm a big fan of your work (especially Saxon). It inspired me to write a book on XSLT around 12 years ago (was a bestseller in Russia). Not sure if you remember - I've contacted for some quotes or something like that back then. :) – lexicore Mar 20 '14 at 08:08
2

This is what I've used myself. It all starts from the dateTime spec:

"Local" or untimezoned times are presumed to be the time in the timezone of some unspecified locality as prescribed by the appropriate legal authority; currently there are no legally prescribed timezones which are durations whose magnitude is greater than 14 hours. The value of each numeric-valued property (other than timeOnTimeline) is limited to the maximum value within the interval determined by the next-higher property. For example, the day value can never be 32, and cannot even be 29 for month 02 and year 2002 (February 2002).

If that is confusing, then go to section 3.2.7.2 Order relation on dateTime

Excerpts (to meet posting criteria here):

The ordering between two dateTimes P and Q is defined by the following algorithm [...] A.Normalize P and Q. That is, if there is a timezone present, but it is not Z, convert it to Z [...]

These would be relevant:

C.Otherwise, if P contains a time zone and Q does not, compare as follows: 1.P < Q if P < (Q with time zone +14:00) 2.P > Q if P > (Q with time zone -14:00) 3.P <> Q otherwise, that is, if (Q with time zone +14:00) < P < (Q with time zone -14:00)

D. Otherwise, if P does not contain a time zone and Q does, compare as follows: 1. P < Q if (P with time zone -14:00) < Q. 2. P > Q if (P with time zone +14:00) > Q. 3. P <> Q otherwise, that is, if (P with time zone +14:00) < Q < (P with time zone -14:00)

The "magic number" 14, from 3.2.7:

[...]currently there are no legally prescribed timezones which are durations whose magnitude is greater than 14 hours.

Of course, you could run in indeterminate scenarios, that is where order cannot be ascertained:

2000-01-01T12:00:00 <> 1999-12-31T23:00:00Z

2000-01-16T12:00:00 <> 2000-01-16T12:00:00Z

2000-01-16T00:00:00 <> 2000-01-16T12:00:00Z

It is really hard to tell what kind of assumption you should make. You need to chase down and understand how that value was captured and then passed on to you in XML, since both assumptions can be wrong! If this data is passed around, eventually sent it back to the systems in the same realm as the one that sent it, a safe practice is to make sure you always have a "string" copy of that data.

I really don't think that the stuff you're getting is random. You just need to read a bit more on these specs. And I am not saying it is easy - it is the way it is; plus, this is not about XML or XSD, it is about timezones in general.

Petru Gardea
  • 21,373
  • 2
  • 50
  • 62
  • What I think is "random" is that mentioned Java tools assume the context to be mylocal context. Which is as good as any other context. :) I am developing a schema-driven XML marshaller/unmarshaller for JavaScript (https://github.com/highsource/jsonix). So as a tool provider I have no idea of the context of the document whatsoever. Nevertheless, users want to convert `xsd:dateTime`s to Dates. Which enforce time zones. It looks like I am forced to make an assumtion. But I'm becoming more and more clear what the assumtion should be. I'll update my answer with my findings. – lexicore Mar 20 '14 at 08:01
  • @lexicore, strictly speaking, it is not `as good as any other context`; `it "may" be` is what it is. We should strive to build systems and specs that are correct, not that assume "no-one is likely to care" - more so when you're a tool provider. I've seen at least one spec in the financial world where people decided to go "against the grain" apparently, that is they didn't map a date/time using the corresponding XSD primitives, but rather string with patterns. While this reduces the validation itself (leap years can sneak through) it was preferred to the overhead required to manage "contexts". – Petru Gardea Mar 20 '14 at 12:36
  • @lexicore (cont'd) The core problem here is (again!) the "disconnect" between what the XSD spec allows (date/time without timezones) vs. the corresponding mapping in programming languages. If I would build such a tool (as you imply your intention is), I would use my own date/time type that would work without a timezone. Thus, implementing the XSD spec the way it was written, not the way your target binding can work. – Petru Gardea Mar 20 '14 at 12:42
  • I _did_ build a special type for xsd:date, xsd:dateTime, xsd:time which _does_ take timezone (or lack thereof) into an account. But users often want Date instead. So what would be your suggestion for this case? – lexicore Mar 20 '14 at 12:59
  • I would just strip the time portion and make sure that the "kind" of timezone is preserved (if unspecified, then the date is also unspecified, etc.) Serializing should also preserve the timezone "kind", so a roundtrip correctly preserves the information. – Petru Gardea Mar 20 '14 at 13:51
  • Timezone is _always_ specified on Dates (at least in JavaScript, I think in Java as well), I can't leave it unspecified. – lexicore Mar 20 '14 at 15:19
  • You just said `I did build a special type for xsd:date, xsd:dateTime, xsd:time which does take timezone (or lack thereof)` - so I am not sure I follow your last comment. I would probably "steal" from Microsoft's [DateTime.Kind](http://msdn.microsoft.com/en-us/library/system.datetime.kind(v=vs.110).aspx), and apply whatever amendments I would need... If you get bogged down into the limitations of the specific platform you're binding to, then forget about the XSD spec... – Petru Gardea Mar 20 '14 at 15:48
  • Yes, I did build a special type in my JavaScript library. Sorry, I can't forgen the XSD spec, I need a solition. – lexicore Mar 20 '14 at 16:00
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/50149/discussion-between-petru-gardea-and-lexicore) – Petru Gardea Mar 20 '14 at 16:08