1

Trying to support Javascript's new Date().toString() output format with Java's DateTimeFormatter but can't seem to make it work.

Js output is of the following nature:

  • Wed Apr 04 2018 09:56:16 GMT-0500 (SA Pacific Standard Time)
  • Wed Apr 04 2018 16:12:41 GMT+0200 (CEST)

My current formatter:

int defaultOffset = ZonedDateTime.now().getOffset().getTotalSeconds();
DateTimeFormatter dtfJs =  new DateTimeFormatterBuilder()
                                .appendPattern("EE MMM dd yyyy HH:mm:ss [OOOO (zzzz)]")
                                .parseDefaulting(ChronoField.OFFSET_SECONDS,defaultOffset 
                                .toFormatter();

If i .parse() those date strings from js, I get the following error:

[date] could not be parse at index 25

Index 25 for both the dates mentioned is:

  • GMT-0500 (SA Pacific Standard Time)
  • GMT+0200 (CEST)

I know the problem is with the : (colon) because if I print the current date with dtfJs, I get:

Wed Apr 04 2018 10:25:10 GMT-05:00 (Colombia Time)

So the part of the GMT-05:00 is exected as GMT-0500 in the string recieved but I can't find a reserved pattern letter which matches this.

The docs say:

Offset O: This formats the localized offset based on the number of pattern letters. One letter outputs the short form of the localized offset, which is localized offset text, such as 'GMT', with hour without leading zero, optional 2-digit minute and second if non-zero, and colon, for example 'GMT+8'. Four letters outputs the full form, which is localized offset text, such as 'GMT, with 2-digit hour and minute field, optional second field if non-zero, and colon, for example 'GMT+08:00'. Any other count of letters throws IllegalArgumentException.

Offset Z: This formats the offset based on the number of pattern letters. One, two or three letters outputs

the hour and minute, without a colon, such as '+0130'. The output will be '+0000' when the offset is zero. Four letters outputs the full form of localized offset, equivalent to four letters of Offset-O. The output will be the corresponding localized offset text if the offset is zero. Five letters outputs the hour, minute, with optional second if non-zero, with colon. It outputs 'Z' if the offset is zero. Six or more letters throws IllegalArgumentException.

Which means that the four letter will output always with colon ":", thus throwing DateTimeParseException

Help greatly appreciated, thanks

Edit

Thanks to @mszymborski I managed to pass on to validate the parenthesis part "(CEST)", what would be useful here ?

I tried with EE MMM dd yyyy HH:mm:ss 'GMT'Z (zz) but this only works with the second date in the list, not the first

  • GMT-0500 (SA Pacific Standard Time) ERROR
  • GMT+0200 (CEST) PASS
mszymborski
  • 1,615
  • 1
  • 20
  • 28
Esteban Rincon
  • 2,040
  • 3
  • 27
  • 44
  • 2
    You can escape the GMT part and use Z, for instance: `EE MMM dd yyyy HH:mm:ss 'GMT'Z` – mszymborski Apr 04 '18 at 15:47
  • @mszymborski Thanks, that worked. But now the problem is with the timezone-name in the parenthesis. – Esteban Rincon Apr 04 '18 at 15:54
  • Yeah, that's a tough one. – mszymborski Apr 04 '18 at 15:56
  • 1
    Interestingly enough if you iterate over `TimeZone.getAvailableIDs()`, then from these IDs obtain TimeZone (`TimeZone.getTimeZone(id)`), and then print both the ID and the display name, it seems like SA Pacific Standard Time is not there. CEST is not present either, but Central European Time (CET) is. The difference is that CEST does not track daylight savings. I don't think this is doable using DateTimeFormatter. – mszymborski Apr 04 '18 at 16:05
  • 2
    To add further to that - someone posted this: https://github.com/nfergu/Java-Time-Zone-List/blob/master/TimeZones/src/TimeZoneList.java, it seems like you'd need to obrain some sort of a dictionary for translation between JS zones and the Java zones. – mszymborski Apr 04 '18 at 16:09
  • jmm but this might be prone to error in some cases – Esteban Rincon Apr 04 '18 at 16:12
  • 1
    Absolutely. I've added java-time tag, maybe someone better versed in these issues could help with explaining how these zones differ. – mszymborski Apr 04 '18 at 16:13
  • 3
    You definitely should not rely on what JS `Date.toString()` returns: it is different per application/browser/user locale. If you want to communicate via REST calls, make sure both sides are completely sure what format the data is transferred in: it's a lot eaiser to both parties this way. For example, use [Date.toISOString](https://developer.mozilla.org/ru/docs/Web/JavaScript/Reference/Global_Objects/Date/toISOString) – M. Prokhorov Apr 04 '18 at 16:21
  • 2
    @estebanrincon, You actually don't need time zone names at all here. You already have time zone offset, this is enough to build an unambiguous `Instant` value. Whatever the name may be, you can just ignore it. – M. Prokhorov Apr 04 '18 at 16:29
  • @M.Prokhorov True, this was actually my final solution. But, isn't the `Date.toString()` a standard for javascript ? – Esteban Rincon Apr 04 '18 at 16:30
  • 1
    @estebanrincon, no, don't think so. I vividly remember looking for a way to emulate Java's `SimpleDateFormat` in JavaScript and learning that not only there is no builtin way to do anything of the sort, but also that `Date.toString` is not part of the ECMAScript standard, which means that different browsers have different idea on how they should implement it. Add the part where user might have `ru_RU` locale and what that means to output, and... yeah, use `toISOString`, that's standardized by a committee. – M. Prokhorov Apr 04 '18 at 16:34
  • @M.Prokhorov Seems like it is: https://www.ecma-international.org/ecma-262/8.0/index.html#sec-date.prototype.tostring – Esteban Rincon Apr 04 '18 at 16:47
  • @estebanrincon, I think this is better suited: [ToDateString](https://www.ecma-international.org/ecma-262/8.0/index.html#sec-todatestring). It specifically talks about "human-readable" form, which consequently is not the best option for machine communication. – M. Prokhorov Apr 04 '18 at 16:58
  • @M.Prokhorov but this is specifically for `Date.prototype.toUTCString ( )` – Esteban Rincon Apr 04 '18 at 17:01

1 Answers1

6

Dates in JavaScript is a big mess. toString() is not only browser/implementation dependent, but also locale sensitive. I'm in Brazil, so my browser is set to Portuguese, and new Date().toString() gives this result:

Wed Apr 04 2018 14:14:04 GMT-0300 (Hora oficial do Brasil)

Month and day-of-week names are in English, but the timezone name is in Portuguese. What a mess!

Anyway, to parse those strings, you have to make some decisions.

Do you need to get the timezone or just the offset?

The offset GMT+0200 is used by more than one country (hence, more than one timezone uses it). Although the offset is enough to have a non-ambiguous point in time, it's not enough to know the timezone.

Even short names such as CEST are not enough, because this is also used by more than 1 country.

If you want to parse just the offset, the best way is to simply remove everything after the ( and parse it to an OffsetDateTime:

DateTimeFormatter parser = DateTimeFormatter.ofPattern("EEE MMM dd yyyy HH:mm:ss 'GMT'Z", Locale.US);

// 2018-04-04T16:12:41+02:00
OffsetDateTime.parse("Wed Apr 04 2018 16:12:41 GMT+0200", parser);

Also note that I used a java.util.Locale. That's because the month and day of week are in English, and if you don't set a locale, it'll use the JVM default - and you can't guarantee that it'll always be English. It's better to set a locale if you know in what language the inputs are.

If you need to get the timezones, though, it's more complicated.

Names like "CEST" are ambiguous, and you need to make arbitrary choices for them. With java.time is possible to build a set of preferred timezones to be used in case of ambiguities:

Set<ZoneId> zones = new HashSet<>();
zones.add(ZoneId.of("Europe/Berlin"));
zones.add(ZoneId.of("America/Bogota"));
DateTimeFormatter fmt = new DateTimeFormatterBuilder()
    .appendPattern("EEE MMM dd yyyy HH:mm:ss 'GMT'Z (")
    // optional long timezone name (such as "Colombia Time" or "Pacific Standard Time")
    .optionalStart().appendZoneText(TextStyle.FULL, zones).optionalEnd()
    // optional short timezone name (such as CET or CEST)
    .optionalStart().appendZoneText(TextStyle.SHORT, zones).optionalEnd()
    // close parenthesis
    .appendLiteral(')')
    // use English locale, for month, timezones and day-of-week names
    .toFormatter(Locale.US);

With this, you can parse your inputs to a ZonedDateTime:

// 2018-04-04T16:12:41+02:00[Europe/Berlin]
ZonedDateTime.parse("Wed Apr 04 2018 16:12:41 GMT+0200 (CEST)", fmt);

// 2018-04-04T10:25:10-05:00[America/Bogota]
ZonedDateTime.parse("Wed Apr 04 2018 10:25:10 GMT-0500 (Colombia Time)", fmt);

But unfortunately, this doesn't parse the "SA Pacific Standard Time" case. That's because the timezones names are built-in in the JVM and "SA Pacific Standard Time" is not one of the predefined strings.

A good alternative is to use the mapping suggested by M.Prokhorov in the comments: https://github.com/nfergu/Java-Time-Zone-List/blob/master/TimeZones/src/TimeZoneList.java

Then you manually replace the name in the string and parse it with VV pattern (instead of z), because the mapping uses IANA's names (such as Europe/Berlin, which are parsed by VV).


But the best alternative is to use toISOString(), which produces strings in ISO8601 format, such as 2018-04-04T17:39:17.623Z. The big advantage is that java.time classes can parse it directly (you don't need to create a custom formatter):

OffsetDateTime.parse("2018-04-04T17:39:17.623Z");
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
flok
  • 120
  • 4
  • I want to mention that there is no "magic" in `OffsetDateTime` being able to parse these seemingly without needing any formatter. What it uses internally is `DateTimeFormatter.ISO_OFFSET_DATE_TIME`. There are several of these constants for various ISO format variants. – M. Prokhorov Apr 05 '18 at 07:40
  • Speaking of messes, think about this: `monthEnd = new Date(2021, 1, 31);` You'll get all kinds of frustrated when you're expecting *Jan 31* but get *Mar 3*! – Clint Pachl Jun 04 '21 at 10:20