5

I'm using the en_GB locale, but a similar issue may also affect other en_XX locales.

Under Java 15 the following code works: LocalDate.parse("10-Sep-17", DateTimeFormatter.ofPattern("dd-MMM-yy", Locale.UK));

Under Java 16 it gives: DateTimeParseException: Text '10-Sep-17' could not be parsed at index 3

After spending a long time in the debugger I have traced this to this commit: 8251317: Support for CLDR version 38

This commit changes the abbreviated form of September in make/data/cldr/common/main/en_GB.xml from Sep to Sept for both the context-sensitive and standalone forms. None of the other months are touched, remaining as 3 characters.

I have verified that this is indeed a genuine change between CLDR versions 37 and 38, although I'm not sure when we Brits switched to using 4 letters for our 3-letter abbreviation for September...

Now this is annoying, as it has broken my datafile processing (although I suspect I can fix it by specifying Locale.ENGLISH rather than using the default locale in my code), but I can't decide if it counts as a bug that has been introduced that breaks my reliable 3-character-month match pattern, or whether this is actually meant to be a feature.

The JavaDoc says:

Text: The text style is determined based on the number of pattern letters used. Less than 4 pattern letters will use the short form. ...

and later:

Number/Text: If the count of pattern letters is 3 or greater, use the Text rules above. Otherwise use the Number rules above.

My bad for never having read this carefully enough to spot that textual values are handled differently to numbers, where the number of letters in your pattern sets the width. But this leaves me wondering how you are supposed to specify a fixed number of characters when you output a month, and equally why it can't be permissive and accept the three-character form when parsing rather than throw an exception?

At the end of the day this still feels like a regression to me. My code that has worked reliably for years parsing dates with 3-character months in now, with no warning, fails on all dates in September. Am I wrong to think this feels incorrect?

Basil Bourque
  • 303,325
  • 100
  • 852
  • 1,154
Tim Barrett
  • 101
  • 1
  • 5
  • 2
    I tend to agree with you, but this question primarily calls for a design opinion, which doesn't fit SO's constraints. – chrylis -cautiouslyoptimistic- Mar 25 '21 at 23:15
  • 3
    Someone else submitted a bug for this as well: https://bugs.openjdk.java.net/browse/JDK-8256837 (which was closed as 'not an issue'). I think you've essentially answered your own question already. Maybe it could be re-framed to ask for a workaround? (that seems on-topic to me) – Jorn Vernee Mar 25 '21 at 23:20
  • It appears that you can either stick with Java 15 or edit your Sep dates to Sept before you parse them. – Gilbert Le Blanc Mar 26 '21 at 00:16
  • 1
    I would say it's an error in CLDR 38, not an error in Java. – Andreas Mar 26 '21 at 00:25
  • 2
    From the bug report: *"If you can make your code change, I'd suggest specifying Locale.US in the formatter. Still, there is a very slim chance that CLDR could change the abbreviated name to "Sept" in Locale.US though, which turns things into the same situation. Another option is to select the locale data that is compatible with the one in JDK8, by specifying `-Djava.locale.providers=COMPAT` in the system property on the Java runtime invocation. This will make sure the name for September is "Sep" in Locale.UK, but may not benefit from locale related features that are integrated after JDK8."* – Stephen C Mar 26 '21 at 00:33
  • 2
    Ultimately, the solution to parsing strings like `"10-Sep-17"` is to avoid them, to not use such localized text for exchanging data. The [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) standard formats were invented expressly to avoid such problems when exchanging date-time values textually. The *java.time* classes use these standard formats by default when parsing/generating strings. – Basil Bourque Mar 26 '21 at 03:50
  • IMHO the problem is not only related to exchanging date-time values textually. Any application where an end user can enter a date textually (for example a date of birth) as "17-Sep-1985" is potentially affected. I already see the tickets: "Last time I could enter my date of birth as 17-Sep-1985, now your application tells my this date is invalid!" – Thomas Kläger Mar 26 '21 at 07:39
  • Very well-researched question, unfortunately off-topic for here. Now you ask, I do not consider it a bug. It is documented, though maybe not that well, that formats may change between versions. There is no way to specify exactly three letters directly, but you may use [`DateTimeFormatterBuilder.appendText(TemporalField, Map)`](https://docs.oracle.com/en/java/javase/16/docs/api/java.base/java/time/format/DateTimeFormatterBuilder.html#appendText(java.time.temporal.TemporalField,java.util.Map)) for specifying precisely which month abbreviations you want. – Ole V.V. Mar 27 '21 at 09:31

0 Answers0