0

Sorry for this question being basic, and for not being able to show the things I've tried. But as of now, I'm having trouble getting my head wrapped around Javas DateTimeFormatter and LocalDateTime.

The code that's not working, but obviously has been working before some change I don't know about (I just got this code thrown in my lap):

public getDateForIception() {
    String tid = driver.findElement(By.cssSelector("div.hendelse-tid.hb-tekst--ingenBryting"))
        .getText().replaceAll("(?<=[A-Za-z]{3})[.a-z]{1,2}", "");
    if(tid.split("\\.")[0].length() == 1) {
        tid = "0" + tid;
    }

    return DatoUtils.parseDatoLocalDateTime(tid,  "dd. MMM yyyy HH:mm");
}

Not entirely sure what the point of the replacing of characters etc. are, but in this case the if() isn't executedm and the "tid" variable is unchanged. Just kept it here for possible reference.

public static LocalDateTime parseDatoLocalDateTime(String datoString, String pattern) {
    DateTimeFormatter formatter = new DateTimeFormatterBuilder()
        .parseCaseInsensitive()
        .appendPattern(pattern)
        .toFormatter(Locale.forLanguageTag("no"));
    return LocalDateTime.parse(datoString, formatter);
}

I suspect there's been some change in the format that's read from the page, so that the parsing fails. But the error message makes little sense to me:

java.time.format.DateTimeParseException: Text '15. jun 2022 19:51' could not be parsed at index 4

Ideas or solutions are greately appreciated.

Ole V.V.
  • 81,772
  • 15
  • 137
  • 161
  • 2
    If I use that formatter to format a new `LocalDateTime` instance, I get `jun.` instead of `jun`. If I change your input to use `jun.` it works. – Rob Spoor Jun 15 '22 at 18:11
  • Which Java version are you coding for? Asking because localized month abbreviations may differ between Java versions. See for example [JDK dateformatter parsing DayOfWeek in German locale, java8 vs java9](https://stackoverflow.com/questions/46244724/jdk-dateformatter-parsing-dayofweek-in-german-locale-java8-vs-java9). – Ole V.V. Jun 16 '22 at 17:40
  • For Norwegian my Java 8 with default settings expects `jan`, `feb` etc., without dots. My Java 9, 11 and 17 expect `jan.`, `feb.` … with dots. – Ole V.V. Jun 16 '22 at 17:52
  • `.replaceAll("(?<=[A-Za-z]{3})[.a-z]{1,2}", "")` removes any 1 or 2 chars that are dot or letter coming after 3 letters. So turns `jan.` into `jan` and `sept.` into `sep`. Somehow this seems to cause your problem. What happens if you leave it out? – Ole V.V. Jun 16 '22 at 18:33
  • If you can determine what month abbreviations you are getting, I should say that the good solution is to build a matching `DateTimeFormatter` using the two-arg `DateTimeFormatterBuilder.appendText()`. See for example [my answer here](https://stackoverflow.com/a/50412644/5772882) and [this one](https://stackoverflow.com/a/52374919/5772882). – Ole V.V. Jun 16 '22 at 18:52

1 Answers1

2

In the formatter builder, Norwegian language is set by this line

.toFormatter(Locale.forLanguageTag("no"));

You can set locale to English by using the language tag en or you should provide Norwegian month names (with a dot at the end for shortened variants) like jan., feb., mar., apr., mai (the dot is not required since it's a full month name), etc.


EDIT: After additional research, I've found that you can parse Norwegian months without an additional dot at the end. To accomplish that, you need to use a standalone format for a month (LLL instead of MMM).

So, your code will look like that

public getDateForIception() {
    String tid = driver.findElement(By.cssSelector("div.hendelse-tid.hb-tekst--ingenBryting"))
        .getText().replaceAll("(?<=[A-Za-z]{3})[.a-z]{1,2}", "");
    if(tid.split("\\.")[0].length() == 1) {
        tid = "0" + tid;
    }

    return DatoUtils.parseDatoLocalDateTime(tid,  "dd. LLL yyyy HH:mm");
}

public static LocalDateTime parseDatoLocalDateTime(String datoString, String pattern) {
    DateTimeFormatter formatter = new DateTimeFormatterBuilder()
        .parseCaseInsensitive()
        .appendPattern(pattern)
        .toFormatter(Locale.forLanguageTag("no"));
    return LocalDateTime.parse(datoString, formatter);
}
geobreze
  • 2,274
  • 1
  • 10
  • 15
  • Isn't that what I'm doing? If you look at the error message: '15. jun 2022 19:51' couldn't be parsed. That string is in Norwegian format. –  Jun 16 '22 at 07:13
  • 1
    Yeah, it couldn't be parsed because `15. jun 2022 19:51` is datetime formatted like `dd. MMM yyyy HH:mm` in English (because of "jun") and `15. jun. 2022 19:51` (note the dot after "jun") is datetime in the same format in Norwegian (because Java considers "jun." as a shortcut from Norwegian juni and "jun" (no dot) as a shortcut from English june) – geobreze Jun 16 '22 at 07:45
  • Hm, I THINK I understand. The problem is that I cannot do anything about the "jun" string on the page, so it seems I have a "crash" between the locales. It works if I change the Locale for the parsing to "en". But I'm wondering if that'll cause problems later. Thanks for the solution, though. –  Jun 16 '22 at 10:21
  • I think that you can have issues with "mai", "oktober" and "desember" months since their abbreviations don't match the English ones – geobreze Jun 16 '22 at 10:26
  • @Hfrav, I've edited my answer, so there is should be no problem with mentioned months – geobreze Jun 16 '22 at 10:51
  • Ah, even better - fantastic :-) –  Jun 16 '22 at 10:54
  • 2
    Also, I've noticed that you're adding padding zero when date has length of 1. It's not necessary since you can use `d` in pattern instead of `dd`. – geobreze Jun 16 '22 at 11:03