3

I've got the following German date: So, 18 Jul 2021 15:24:00 +0200

I'm unable to parse it using Java Time:

DateTimeFormatter.ofPattern("EEE, dd MMM yyyy HH:mm:ss Z", Locale.GERMANY)
  .parse("So, 18 Jul 2021 15:24:00 +0200", Instant::from)

as it throws: Text 'So, 18 Jul 2021 15:24:00 +0200' could not be parsed at index 0

If I were to change the string to be properly formatted it works:

-So, 18 Jul 2021 15:24:00 +0200
+So., 18 Juli 2021 15:24:00 +0200

Is there any magic pattern to parse the above date?


I've also got the same problem for other dates

  • LocalDateTime.parse("ven, 16/07/2021 - 09:49", DateTimeFormatter.ofPattern("EE, dd/MM/yyyy - HH:mm", Locale("fr")))
    • ven must be ven.
  • LocalDateTime.parse("vr, 23 apr 2021 17:04:00", DateTimeFormatter.ofPattern("EE, dd MM yyyy HH:mm:ss", Locale("nl")))
    • apr must be 04 (in order to use MM)
Niklas
  • 23,674
  • 33
  • 131
  • 170
  • 1
    `DateTimeFormatter` is used to parse valid dates and doesn't exist to clean up bad input. Clean up your input before it gets to the formatter – g00se Jul 31 '21 at 10:05
  • 1
    I agree and you don’t need the day name since you have a complete date. – Joakim Danielson Jul 31 '21 at 11:21
  • @JoakimDanielson Validation is very often a good thing. – Ole V.V. Jul 31 '21 at 19:28
  • Probably related: [JDK dateformatter parsing DayOfWeek in German locale, java8 vs java9](https://stackoverflow.com/questions/46244724/jdk-dateformatter-parsing-dayofweek-in-german-locale-java8-vs-java9) – Ole V.V. Jul 31 '21 at 19:29
  • 1
    "`apr` must be `07`"? Shouldn't that be `04` instead? – MC Emperor Jul 31 '21 at 22:18

2 Answers2

3

The modern Date-Time API is very particular about the pattern. So, it is almost impossible to create a single pattern that you can use to parse all types of strings. However, one of the greatest features of DateTimeFormatter is its flexibility to work with optional patterns, specified using the square bracket e.g. the following demo uses E, d [MMMM][MMM][M] u H:m:s Z which has three optional patterns for the month.

Demo:

import java.time.DateTimeException;
import java.time.Instant;
import java.time.format.DateTimeFormatter;
import java.util.Locale;
import java.util.stream.Stream;


public class Main {
    public static void main(String[] args) {
        Stream.of(
                "So., 18 Juli 2021 15:24:00 +0200",
                "ven., 16 avr. 2021 15:24:00 +0200",
                "vr, 16 apr. 2021 15:24:00 +0200",
                "vr, 16 07 2021 15:24:00 +0200"
        ).forEach(s -> {
            Stream.of(
                    Locale.GERMANY,
                    Locale.FRANCE,
                    new Locale("nl", "NL")
            ).forEach( locale -> {
                try {
                    System.out.println("Parsed '" + s + "' using the locale, " + locale + " => " + parseToInstant(s, locale));
                }catch(DateTimeException e) {
                    //....
                }
            });
        });
    }

    static Instant parseToInstant(String strDateTime, Locale locale) {
        return DateTimeFormatter.ofPattern("E, d [MMMM][MMM][M] u H:m:s Z").withLocale(locale).parse(strDateTime,
                Instant::from);
    }
}

Output:

Parsed 'So., 18 Juli 2021 15:24:00 +0200' using the locale, de_DE => 2021-07-18T13:24:00Z
Parsed 'ven., 16 avr. 2021 15:24:00 +0200' using the locale, fr_FR => 2021-04-16T13:24:00Z
Parsed 'vr, 16 apr. 2021 15:24:00 +0200' using the locale, nl_NL => 2021-04-16T13:24:00Z
Parsed 'vr, 16 07 2021 15:24:00 +0200' using the locale, nl_NL => 2021-07-16T13:24:00Z

ONLINE DEMO

Learn more about the Date-Time patterns from DateTimeFormatterBuilder.

Arvind Kumar Avinash
  • 71,965
  • 6
  • 74
  • 110
2

Specify your own abbreviations for the days of the week

According to CLDR German day of week abbreviations are written with a dot. To have Java parse a string where the abbreviations lacks the dot there are two obvious solutions:

  1. Don’t use CLDR. Java’s own abbreviations from Java 8 and before did not have the dots and are still available in newer Java versions.
  2. Specify your own abbreviations.

Since you had similar problems with French, where Java’s own abbreviations have dots too, I suggest that solution 1. would be insufficient for you. So let’s delve into solution 2. My code below takes CLDR’s abbreviations, e.g., So., and removes the trailing dots from them, so you get for example So as in your string.

    Locale loc = Locale.GERMANY;
    Map<Long,String> dowsWithoutDots = Arrays.stream(DayOfWeek.values())
            .collect(Collectors.toMap(dow -> Long.valueOf(dow.getValue()),
                    dow -> dow.getDisplayName(TextStyle.SHORT, loc).replaceFirst("\\.$", "")));
    Map<Long,String> monthsWithoutDots = Arrays.stream(Month.values())
            .collect(Collectors.toMap(m -> Long.valueOf(m.getValue()),
                    m -> m.getDisplayName(TextStyle.SHORT, loc).substring(0, 3)));
    DateTimeFormatter germanWithoutDots = new DateTimeFormatterBuilder()
            .appendText(ChronoField.DAY_OF_WEEK, dowsWithoutDots)
            .appendPattern(", dd ")
            .appendText(ChronoField.MONTH_OF_YEAR, monthsWithoutDots)
            .appendPattern(" yyyy HH:mm:ss Z")
            .toFormatter(loc);
    
    System.out.println(germanWithoutDots.parse("So, 18 Jul 2021 15:24:00 +0200", Instant::from));

Output from the snippet is:

2021-07-18T13:24:00Z

For the month abbreviations removing the final dot did not work since, as you have observed, CLDR’s abbreviation is Juli where you have got Jul. So instead of removing the dot I abbreviate to three characters. You should test that it works for all months (including Mai).

I have not tried the same for French and Dutch, but it should work.

In case you want to try your luck with solution 1., circumventing CLDR completely, see JDK dateformatter parsing DayOfWeek in German locale, java8 vs java9.

Ole V.V.
  • 81,772
  • 15
  • 137
  • 161