September's short form "Sep" no longer parses in Java 17 in en_GB locale

Question

This works with Java 11 but does not work with Java 17

DateTimeFormatter format = DateTimeFormatter.ofPattern("MMM dd, yyyy")
    .withLocale(Locale.UK);
format.parse("Sep 29, 1988");

Java 17 stacktrace:

Exception in thread "main" java.time.format.DateTimeParseException: Text 'Sep 29, 1988' could not be parsed at index 0
at java.base/java.time.format.DateTimeFormatter.parseResolved0(DateTimeFormatter.java:2052)
at java.base/java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1880)

My Java version:

openjdk version "17" 2021-09-14 LTS
OpenJDK Runtime Environment Zulu17.28+13-CA (build 17+35-LTS)
OpenJDK 64-Bit Server VM Zulu17.28+13-CA (build 17+35-LTS, mixed mode, sharing)

What has changed?

That was it. My default locale is `en_GB`. Not sure if this change in behaviour is intentional (it probably is) but it's very inconvenient. — steven35, Sep 21 '21 at 11:24
This is why you should use [standardized date formats](https://en.wikipedia.org/wiki/ISO_8601) rather than localiser strings when exchanging date-time values textually. — Basil Bourque, Sep 21 '21 at 17:22
@BasilBourque I'm parsing this from some HTML so obviously it's not my choice. — steven35, Nov 10 '21 at 11:02
Related question: https://stackoverflow.com/questions/70928852/customize-a-locale-in-java/ — Niels Basjes, Feb 02 '22 at 09:48

score 25 · Accepted Answer · edited Sep 21 '21 at 17:15

25

It seems to be that in the en_GB locale, the short form of September is now "Sept", not "Sep". All the other months are the same 3 letters abbreviations as in en_US. Kind of makes sense. As a Brit, "Sep" looks wrong to me.

This is the ticket: https://bugs.openjdk.java.net/browse/JDK-8251317

It wasn't a conscious decision by the JDK authors. The locale data used by default in Java comes from Common Locale Data Repository (CLDR), which is a project by the Unicode Consortium. Newer versions of Java come with newer versions of the CLDR. So you may occasionally see a change in locale behavior. So the change you encountered is a feature, not a bug.

Yours is just one of many small tweaks.

Here's the specific change in the PR which broke it for you: https://github.com/openjdk/jdk/pull/1279/files#diff-97210acd6f77c4f4979c43445d60ba1c369f058230e41177dceca697800b1fa2R116

edited Sep 21 '21 at 17:15

Basil Bourque

303,325
100
852
1,154

answered Sep 21 '21 at 11:40

Michael

41,989
11
82
128

All of our production servers have GB locale and the code is full of date parsing. I'd vote for consistency over "what looks right" but I guess that's just my opinion. I appreciate the links. – steven35 Sep 21 '21 at 11:53
3

@steven35: the problem with consistency in this is that locales can never improve if we value consistency above all else. Basically parsing free-form text dates without very precise specifications (which tend to use numbers) is a risky thing to do no matter what library you use. – Joachim Sauer Sep 21 '21 at 11:58
2

@steven35 You say you want consistency but locale-related stuff is constantly evolving. Currencies, countries, language, etc, etc, are all fluid. Keeping the data static might make it consistent *between Java versions*, but it becomes inconsistent with reality. They need to update it at some point. – Michael Sep 21 '21 at 12:03
3

@Michael if data were saved in a sane format, thez wouldn't need update – 9ilsdx 9rvj 0lo Sep 21 '21 at 12:29
2

Agree with @9ilsdx9rvj0lo here – I don't expect, for example, the format `uuuu-MM-dd'T'HH:mm:ss` to change for a while. – MC Emperor Sep 21 '21 at 12:48
@Michael it's not always an option – steven35 Nov 09 '21 at 18:03
1

@steven35 name just one comprehensible reason not to use a format like ISO 8601 for persistence. – Holger Nov 10 '21 at 08:16
2

@Holger Who said this format was for persistence? It's parsed from a HTML page not a database. Do you know that dates are not always displayed in ISO format? – steven35 Nov 10 '21 at 10:59
1

@steven35 an HTML page that you are trying to parse *is* persistent data. And obviously the wrong approach for what you are trying to do. Yes, “dates are not always displayed in ISO format”. But since you are trying to *parse* that page, that’s irrelevant. – Holger Nov 10 '21 at 11:46
1

@Holger have to agree with Steven on this. There are obviously valid reasons to parse dates in all forms and they're not always normalized for a computer. Yes, scraping a website designed for humans is always going to be somewhat brittle, but that doesn't make it wrong or necessarily suboptimal. If you have no control over the source of your data then sometimes you have to deal with data that's presented in a way that's different than you'd ideally like. You can't just whine to the producer to change it. That's something most people learn in their first year being a professional developer. – Michael Nov 10 '21 at 11:54
1

@Michael if you know what you are doing, you know that you can’t expect external data to match exactly the pattern provided by the locale implementation of your local system. Otherwise, you end up with a software that breaks when a date string contains “Sept” instead of “Sep” or vice versa. – Holger Nov 10 '21 at 12:18
1

@Holger Yes, so the probable solution to that would be to specify a Locale which matches the source. Your parser will still be subject the source data changing format (when scraping something designed for humans, that's unavoidable), but at least *your code* is portable. Omitting a Locale is a common mistake that's easily made because the API doesn't rigidly enforce you to do so. The solution is **not** necessarily anything to do with changing the format (e.g. to ISO), because there are perfectly valid scenarios when you simply can't. – Michael Nov 10 '21 at 12:35
1

Well, obviously, specifying the locale wasn’t sufficient in the OP’s case. And [you said yourself](https://stackoverflow.com/questions/69267710/69268271?noredirect=1#comment122430101_69268271) why such an approach isn’t sufficient. – Holger Nov 10 '21 at 12:43
2

@Holger Suppose OP is in the UK where we use 'Sept', and the source website they are trying to scrape is American where they use 'Sep'. In that case, the bug is that they were relying on their default Locale for parsing which did not match the source Locale. What I was saying in that comment is that, because language is fluid, of course you can't expect a solution like this to work forever. But tasks like scraping websites are not solutions you should *ever* expect to work forever. They are *inherently* brittle. That does not invalidate them. Sometimes scraping a website is the best you can do – Michael Nov 10 '21 at 12:58
1

@Holger Have you seriously never parsed anything with a computer that wasn't *specifically designed* to be parsed by a computer? – Michael Nov 10 '21 at 13:02
1

@Michael you are saying “you can't expect a solution like this to work forever” and I’m saying “you can't even expect a solution like this to work a second time¹”. Not so much different in the context of an OP assuming that this worked forever. The problem is not that the OP had to parse data not designed to be parsed by a computer. The problem is the approach chosen for the task. — ¹ because what happened with updating to Java 17 could have happened with any other tiny change in the environment too (some Java implementations use the operating system’s locale data, for example). – Holger Nov 10 '21 at 13:28
1

@Holger "*The problem is not that the OP had to parse data not designed to be parsed by a computer*" That is quite literally the problem. If you can't see that then we are just wasting each other's time. "*The problem is the approach chosen for the task*" Name a solution for parsing a human-readable string that doesn't suffer from the same issues. There isn't one, but I'll wait. – Michael Nov 10 '21 at 13:44
@Michael this question and the accepted answer have the potential to help a lot of people in the UK as the adoption of Java 17 grows but Holger managed to turn the comment section into an irrelevant generic lecture on how to persist dates in a database which most people are already familiar with. – steven35 Nov 10 '21 at 14:02

score 1 · Answer 2 · answered Aug 05 '22 at 16:26

Aside from the arguments of whether parsing text (from external legacy sources) for date/times is a good thing, or whether standards should be allowed to evolve versus backward compatibility...

a practical fix is to switch Locale.UK to Locale.US, for parsing Sep 29, 1988 or 30-Sep-2020 etc.

September's short form "Sep" no longer parses in Java 17 in en_GB locale

2 Answers2

Linked

Related