3

I'm trying to write a class, which able to parse multi-format and multi-locale strings into DateTime.

multi-format means that date might be: dd/MM/yyyy, MMM dd yyyy, ... (up to 10 formats)

multi-locale means that date might be: 29 Dec 2015, 29 Dez 2015, dice 29 2015 ... (up to 10 locales, like en, gr, it, jp )

Using the answer Using Joda Date & Time API to parse multiple formats I wrote:

val locales = List(
  Locale.ENGLISH,
  Locale.GERMAN,
  ...
)

val patterns = List(
  "yyyy/MM/dd",
  "yyyy-MM-dd",
  "MMMM dd, yyyy",
  "dd MMMM yyyy",
  "dd MMM yyyy"
)

val parsers = patterns.flatMap(patt => locales.map(locale => DateTimeFormat.forPattern(patt).withLocale(locale).getParser)).toArray
val birthDateFormatter = new DateTimeFormatterBuilder().append(null, parsers).toFormatter

but it doesn't work:

birthDateFormatter.parseDateTime("29 Dec 2015") // ok
birthDateFormatter.parseDateTime("29 Dez 2015") // exception below

Invalid format: "29 Dez 2015" is malformed at "Dez 2015"
java.lang.IllegalArgumentException: Invalid format: "29 Dez 2015" is
malformed at "Dez 2015"

I found what all parsers: List[DateTimeParser] had "lost" their locales after an appending into birthDateFormatter: DateTimeFormatter. And birthDateFormatter has only one locale - en.

I can write:

val birthDateFormatter = locales.map(new DateTimeFormatterBuilder().append(null, parsers).toFormatter.withLocale(_))

and use it like:

birthDateFormatter.map(_.parseDateTime(stringDate))

but it will throw a lots of exceptions. It's terrible.

How can I parse multi-format and multi-locale strings using joda-time? How can I do it any other way?

Community
  • 1
  • 1
sheh
  • 1,003
  • 1
  • 9
  • 31
  • 1
    This has inspired me to implement [my own multi-format parser](https://github.com/MenoData/Time4J/issues/426). Although it is not Joda-Time, it can process multiple formats AND locales without even throwing and catching exceptions internally. Maybe you can study the source code, JUnit-test-case and try to learn from it. However, I assume that the implementation is not transferrable to Joda-Time 1:1. Maybe you find a way for doing so. See also the [tutorial page](http://time4j.net/tutorial/format.html#x3) of my library. – Meno Hochschild Jan 01 '16 at 08:54
  • @MenoHochschild, it's a good solution! I'm going to try Time4J later. I think, you should add an answer in the topic. The solution might be helpful someone. – sheh Jan 10 '16 at 09:00
  • Well, I will wait with an answer until I have managed to release the next version of Time4J (v3.14/v4.11) which will contain the mentioned and already implemented `MultiFormatParser`. I hope next week-end. I will also try to find some time to investigate the performance in more detail. – Meno Hochschild Jan 11 '16 at 04:47
  • Sorry for late answer (see below) but I wanted to offer an alternative to Joda-Time only if it is really quicker and can help you with your performance problem. – Meno Hochschild Feb 15 '16 at 04:30

2 Answers2

3

That was interesting to investigate. This is a test suite that helped me (in Java, but I hope you'll get the idea):

import java.util.*;
import java.util.stream.Collectors;

import org.joda.time.DateTime;
import org.joda.time.format.*;
import org.junit.Test;

import static org.assertj.core.api.Assertions.*;

public class JodaTimeLocaleTest {

    @Test // fails on both assertions
    public void testTwoLocales() {
        List<Locale> locales = Arrays.asList(Locale.FRENCH, Locale.GERMAN);
        DateTimeParser[] parsers = locales.stream()
                .map(locale -> DateTimeFormat.forPattern("dd MMM yyyy").withLocale(locale).getParser())
                .collect(Collectors.toList())
                .toArray(new DateTimeParser[0]);
        DateTimeFormatter formatter = new DateTimeFormatterBuilder().append(null, parsers).toFormatter();

        DateTime dateTime1 = formatter.parseDateTime("29 déc. 2015");
        DateTime dateTime2 = formatter.parseDateTime("29 Dez 2015");

        assertThat(dateTime1).isEqualTo(new DateTime("2015-12-29T00:00:00"));
        assertThat(dateTime2).isEqualTo(new DateTime("2015-12-29T00:00:00"));
    }

    @Test // passes
    public void testFrench() {
        DateTimeFormatter formatter = DateTimeFormat.forPattern("dd MMM yyyy").withLocale(Locale.FRENCH);

        DateTime dateTime = formatter.parseDateTime("29 déc. 2015");

        assertThat(dateTime).isEqualTo(new DateTime("2015-12-29T00:00:00"));
    }

    @Test // passes
    public void testGerman() {
        DateTimeFormatter formatter = DateTimeFormat.forPattern("dd MMM yyyy").withLocale(Locale.GERMAN);

        DateTime dateTime = formatter.parseDateTime("29 Dez 2015");

        assertThat(dateTime).isEqualTo(new DateTime("2015-12-29T00:00:00"));
    }
}

First of all, your first example

birthDateFormatter.parseDateTime("29 Dec 2015")

passes only because your machine's default locale is English. If it was different, also this case would have failed. That's why I'm using French and German when running on a machine with English locale. In my case, both assertions fail.

It turns out that the locale is not stored in the parser, but in the formatter only. So when you do

DateTimeFormat.forPattern("dd MMM yyyy").withLocale(locale).getParser()

the locale is set on the formatter, but is then lost when creating the parser:

// DateTimeFormatter#withLocale:
public DateTimeFormatter withLocale(Locale locale) {
    if (locale == getLocale() || (locale != null && locale.equals(getLocale()))) {
        return this;
    }
    // Notice how locale does not affect the parser
    return new DateTimeFormatter(iPrinter, iParser, locale,
            iOffsetParsed, iChrono, iZone, iPivotYear, iDefaultYear);
}

Next, when you create a new formatter

new DateTimeFormatterBuilder().append(null, parsers).toFormatter()

it's created with the system's default locale (unless you override it with withLocale()). And that locale is used during parsing:

// DateTimeFormatter#parseDateTime
public DateTime parseDateTime(String text) {
    InternalParser parser = requireParser();

    Chronology chrono = selectChronology(null);
    // Notice how the formatter's locale is used
    DateTimeParserBucket bucket = new DateTimeParserBucket(0, chrono, iLocale, iPivotYear, iDefaultYear);
    int newPos = parser.parseInto(bucket, text, 0);
    // ... snipped
}

So it turns out that although you can have multiple parsers to support multiple formats, still only a single locale can be used per formatter instance.

Adam Michalik
  • 9,678
  • 13
  • 71
  • 102
1

Answer to question 1 (How can I parse multi-format and multi-locale strings using joda-time?):

No this is not possible the way you want, see also the good answer of @Adam Michalik. So the only way is just to write a list of multiple Joda-formatters and to try each one for a given input - possibly catching exceptions. You have already found the right workaround so I don't describe the details here.

Answer to question 2 (How can I do it any other way?):

My library Time4J has got a new MultiFormatParser-class since v4.11. However, I discovered some performance issues with its format engine in general (mainly due to autoboxing feature of Java) so I decided to wait with this answer until release v4.12 where I have improved the performance. According to my first benchmarks Time4J-4.12 seems to be quicker than Joda-Time (v2.9.1) because internal exceptions are strongly reduced. So I think you can give that latest version of Time4J a try and report then some feedback if it works for you.

private static final MultiFormatParser<PlainDate> TIME4J;

static {
    ChronoFormatter<PlainDate> f1 = 
      ChronoFormatter.ofDatePattern("dd.MM.uuuu", PatternType.CLDR, Locale.ROOT);
    ChronoFormatter<PlainDate> f2 = 
      ChronoFormatter.ofDatePattern("MM/dd/uuuu", PatternType.CLDR, Locale.ROOT);
    ChronoFormatter<PlainDate> f3 = 
      ChronoFormatter.ofDatePattern("uuuu-MM-dd", PatternType.CLDR, Locale.ROOT);
    ChronoFormatter<PlainDate> f4 = 
      ChronoFormatter.ofDatePattern("uuuuMMdd", PatternType.CLDR, Locale.ROOT);
    ChronoFormatter<PlainDate> f5 = 
      ChronoFormatter.ofDatePattern("d. MMMM uuuu", PatternType.CLDR, Locale.GERMAN);
    ChronoFormatter<PlainDate> f6 = 
      ChronoFormatter.ofDatePattern("d. MMMM uuuu", PatternType.CLDR, Locale.FRENCH);
    ChronoFormatter<PlainDate> f7 = 
      ChronoFormatter.ofDatePattern("MMMM d, uuuu", PatternType.CLDR, Locale.US);
    TIME4J = MultiFormatParser.of(f1, f2, f3, f4, f5, f6, f7);
}

...

static List<PlainDate> parse(List<String> input) {
    ParseLog plog = new ParseLog();
    int n = input.size();
    List<PlainDate> result = new ArrayList<>(n);

    for (int i = 0; i < n; i++){
        String s = input.get(i);
        plog.reset();
        PlainDate date = TIME4J.parse(s, plog);
        if (!plog.isError()) {
            result.add(date);
        } else {
            // log or report error
        }
    }
    return result;
}
  • Every single parser within MultiFormatParser keeps its own locale.
  • The order of parser components matters in terms of performance. Prefer those patterns and locales for first positions which are most common in your input.
  • I strongly recommend to use a static constant for the MultiFormatParser because a) it is immutable and b) constructing formatters is expensive in every library (and Time4J is no exception about this detail).
  • For interoperability with Joda-Time you can consider this conversion: LocalDate joda = new LocalDate(plainDate.getYear(), plainDate.getMonth(), plainDate.getDayOfMonth()); But keep in mind that every conversion has some extra costs. On the other side, Joda-Time offers less features than Time4J so latter one can do the full job of all date-time-zone relevant tasks, too.
  • I am not a scala guy but assume that following scala code might compile: val parser = MultiFormatParser.of(patterns.flatMap(patt => locales.map(locale => ChronoFormatter.ofDatePattern(patt, PatternType.CLDR, locale))).toArray)
  • By the way: The performance of Joda-Time is not so bad since it was a tough task for me to make it better in Time4J-v4.12. Parsing so different patterns and locales is always a complex task. Surprising for me: The new time library built in Java-8 (package java.time) is the worst in terms of performance according to my own experiments (obviously due to internal exception handling).
  • If you don't work on Java-8-platforms then you can use Time4J-v3.15 (backport to Java-6-platforms).
Meno Hochschild
  • 42,708
  • 7
  • 104
  • 126