2

I don't speak Russian, so I'm having trouble validating whether the months are correctly spelled, etc. To be honest, I'm not fully sure that my input is in Russian (Russian is the language detected by Google translate)

I have some code in Kotlin which does a best-effort to parse dates specified in various formats and languages. I'm struggling with parsing Russian dates, however. Here's the relevant part of my code:

sequenceOf(
  "ru-RU", // Russian
  "sr", // Serbian
).forEach {
  val format = DateTimeFormatter.ofPattern("d MMM. yyyy")
    .withLocale(Locale.forLanguageTag(it))
  try {
    return listOf(LocalDate.parse(dateString, format))
  } catch (e: Exception) {
    //Ignore and move on
  }
}

This code correctly parses "27 апр. 2018" and "24 мая. 2013", but fails on "28 фев. 2019".

What's special about "28 фев. 2019" and/or how can I parse this value correctly?

If you provide answers in Java, I can translate it to Kotlin fairly easily.


EDIT: Here's an SSCCE in Kotlin:

import java.time.LocalDate
import java.time.format.DateTimeFormatter
import java.util.*

println("System.getProperty - " + System.getProperty("java.version"));
println("Runtime.version - " + Runtime.version());

val dateString = "28 фев. 2019"

sequenceOf(
    "ru-RU", // Russian
    "sr", // Serbian
).forEach {
    val format = DateTimeFormatter.ofPattern("d MMM. yyyy")
        .withLocale(Locale.forLanguageTag(it))
    try {
        println("Parse successful - " + LocalDate.parse(dateString, format))
    } catch (e: Exception) {
        println("Parse failed - " + e)
    }
}

Output on my system:

System.getProperty - 17.0.4.1
Runtime.version - 17.0.4.1+7-b469.62
Parse failed - java.time.format.DateTimeParseException: Text '28 фев. 2019' could not be parsed at index 3
Parse failed - java.time.format.DateTimeParseException: Text '28 фев. 2019' could not be parsed at index 3
  • 3
    Would you mind clarifying what is wrong with `28 фев. 2019`? In my env (java 8) everything works smoothly. – Andrey B. Panfilov Nov 14 '22 at 23:59
  • please share the error, or the output that you are getting – sidgate Nov 15 '22 at 02:39
  • If you format that date in Russian and Serbian, what are the results? Also which is your Java version, and have you tried other Java versions too? There may be a difference. – Ole V.V. Nov 15 '22 at 07:07
  • My Java 18 expects the abbreviation to be `февр.` with a dot in Russian, so the dot in your format is redundant. In Serbian it expects `феб`, so with `б` instead of `в` last. – Ole V.V. Nov 15 '22 at 08:48
  • @AndreyB.Panfilov I've updated the post with an SSCCE. – Nebu Pookins Nov 16 '22 at 08:50
  • @sidgate I've updated the post with an SSCCE. – Nebu Pookins Nov 16 '22 at 08:50
  • @OleV.V. I've updated the post with an SSCCE. The value `"28 фев. 2019"` comes from user input, so I can't change it. What I can do is changed the date format to try to accept more formats. – Nebu Pookins Nov 16 '22 at 08:52
  • Do you know whether February *always* comes as `фев.`? Or asked the other way around, can you risk sometimes getting `февр.`or `фев` (without dot) or something else? – Ole V.V. Nov 16 '22 at 09:46
  • You may want to use a `DateTimeFormatterBuilder` and its overloaded [`appendText(TemporalField, Map)` method](https://docs.oracle.com/en/java/javase/18/docs/api/java.base/java/time/format/DateTimeFormatterBuilder.html#appendText(java.time.temporal.TemporalField,java.util.Map)). With this you can explicitly specify which abbreviations the user uses for each month. Only one abbreviation per month of the year, though. – Ole V.V. Nov 16 '22 at 09:57
  • Example in [my answer here](https://stackoverflow.com/a/50412644/5772882). – Ole V.V. Nov 16 '22 at 15:00

2 Answers2

1

Your input seems to have wrong abbreviation. The correct abbreviation should be февр.. Check this page and this page for more information.

A workaround would be to replace the input with the correct abbreviation before you parse it.

public class Main {
    public static void main(String[] args) {
        String input = "28 фев. 2019";
        input = input.replace("фев.", "февр.");

        DateTimeFormatter dtf = DateTimeFormatter.ofPattern("d MMM uuuu",
                Locale.forLanguageTag("ru-RU"));

        System.out.println(LocalDate.parse(input, dtf));
        System.out.println(LocalDate.of(2019, 2, 28).format(dtf));
    }
}

Output:

2019-02-28
28 февр. 2019
Arvind Kumar Avinash
  • 71,965
  • 6
  • 74
  • 110
1

Since you are parsing user input, I believe, the only option is to normalize that input prior parsing it - appealing to standards is not an option there.

In Russian language we use genitive form of month names in dates (M(M)+ vs L(L)+ in java DateTimeFormat) and, normally, short forms are produced using rules below (please do not confuse that with programming standards, conventions, habits, tricks, UI/UX guides, etc):

  • . (dot) denotes the short form of the word (мая. vs мая - the first form looks ridiculous because мая is a full genitive form of May, another case: июн. vs июня - both have the same length but июня is a full genitive form of June)
  • typically successive consonant should be kept if they followed by vowel in the full form (there are some exceptions for double consonants) - seems to be your case: фев. vs февр.
  • short form should not end in vowel, й, ь or ъ

Based on that and taking into account possible user mistakes, typos, common sense and programming habits you may potentially face with the following "short genitive forms" of month names in the wild:

  • January: янв, янв.
  • February: фев, февр, фев., февр.
  • March: мар, марта, мар., март.
  • April: апр, апр.
  • May: мая, мая.
  • June: июн, июня, июн.
  • July: июл, июля, июл.
  • August: авг, авг.
  • September: сен, сент, сен., сент.
  • October: окт, окт.
  • November: ноя, нояб, ноя., нояб.
  • December: дек, дек.
Andrey B. Panfilov
  • 4,324
  • 2
  • 12
  • 18
  • Interesting. You are including `мар` without dot but not `март` without dot. And `мая.` with dot even though I don’t think it’s an abbreviation? (And not sure to what extent you are answering the question, but the information is still relevant.) – Ole V.V. Nov 16 '22 at 17:55
  • `март` - it is a nominative form of `March`, it is barely possible someone may use it as "short genitive form". Also there are [some recommendations](http://new.gramota.ru/spravka/buro/search-answer?s=242637) to not create short form of it, same applies for `май`, `июнь` and `июль` - java's `LLL` (CLDR, I believe) follows those recommendations. TC is actually supposed to write a set of transformation like `s/\sмар.+\s/.03./` to mitigate possible ambiguities in user input. – Andrey B. Panfilov Nov 16 '22 at 18:20
  • 1
    It seems to me that [this is a solution with enough flexibility to cover all your cases](https://ideone.com/96VjJj). – Ole V.V. Nov 16 '22 at 20:07