-1

I need to read a CSV file which has different time format in one timestamp column. It can be anything from below mentioned 5 formats. I need to match the fetched date and parse accordingly on each row. How can I validate and parse it?

public static final String DEFAULT_DATE_FORMAT_PATTERN = "yyyy-MM-dd";
public static final String DEFAULT_DATE_TIME_FORMAT_PATTERN = "yyyy-MM-dd HH:mm:ss.SSS";
public static final String DATE_TIME_MINUTES_ONLY_FORMAT_PATTERN = "yyyy-MM-dd HH:mm";
public static final String DATE_TIME_WITHOUT_MILLIS_FORMAT_PATTERN = "yyyy-MM-dd HH:mm:ss";
// Epoch in milli
halfer
  • 19,824
  • 17
  • 99
  • 186
Tanu Garg
  • 3,007
  • 4
  • 21
  • 29
  • Yes but here I am not creating the format. I first need to find the pattern from the input which would be any of among 5 and then need to parse it. – Tanu Garg Oct 22 '20 at 06:42
  • yes @GhostCat this data is curated from different different sources so has multiple types. – Tanu Garg Oct 22 '20 at 06:46
  • 1
    There are so many posts on SO for the same/similar requirement e.g. https://stackoverflow.com/questions/4024544/how-to-parse-dates-in-multiple-formats-using-simpledateformat - https://stackoverflow.com/questions/44600420/datetimeformatter-accepting-multiple-dates-and-converting-to-one-java-time-libr - https://stackoverflow.com/questions/51587242/how-to-parse-different-date-formats-from-string-in-java-faq – Arvind Kumar Avinash Oct 22 '20 at 06:56
  • 2
    Thanks @ArvindKumarAvinash. Those links helped. – Tanu Garg Oct 22 '20 at 12:21

2 Answers2

4

What you need is a formatter with optional parts. A pattern can contain square brackets to denote an optional part, for example HH:mm[:ss]. The formatter then is required to parse HH:mm, and tries to parse the following text as :ss, or skips it if that fails. yyyy-MM-dd[ HH:mm[:ss[.SSS]]] would then be the pattern.

There is only one issue here – when you try to parse a string with the pattern yyyy-MM-dd (so without time part) using LocalDateTime::parse, it will throw a DateTimeFormatException with the message Unable to obtain LocalDateTime from TemporalAccessor. Apparently, at least one time part must be available to succeed.

Luckily, we can use a DateTimeFormatterBuilder to build a pattern, instructing the formatter to use some defaults if information is missing from the parsed text. Here it is:

DateTimeFormatter formatter = new DateTimeFormatterBuilder()
    .appendPattern("yyyy-MM-dd[ HH:mm[:ss[.SSS]]]")
    .parseDefaulting(ChronoField.HOUR_OF_DAY, 0)
    .parseDefaulting(ChronoField.MINUTE_OF_HOUR, 0)
    .parseDefaulting(ChronoField.SECOND_OF_MINUTE, 0)
    .toFormatter();
LocalDateTime dateTime = LocalDateTime.parse(input, formatter);

Tests:

String[] inputs = {
    "2020-10-22", // OK
    "2020-10-22 14:55", // OK
    "2020-10-22T14:55", // Fails: incorrect format
    "2020-10-22 14:55:23",
    "2020-10-22 14:55:23.9", // Fails: incorrect fraction of second
    "2020-10-22 14:55:23.91", // Fails: incorrect fraction of second
    "2020-10-22 14:55:23.917", // OK
    "2020-10-22 14:55:23.9174", // Fails: incorrect fraction of second
    "2020-10-22 14:55:23.917428511" // Fails: incorrect fraction of second 
};

And what about epoch in milli?

Well, this cannot be parsed directly by the DateTimeFormatter. But what's more: an epoch in milli has an implicit timezone: UTC. The other patterns lack a timezone. So an epoch is a fundamentally different piece of information. One thing you could do is assume a timezone for the inputs missing one.

However, if you nevertheless want to parse the instant, you could try to parse it as a long using Long::parseLong, and if it fails, then try to parse with the formatter. Alternatively, you could use a regular expression (like -?\d+ or something) to try to match the instant, and if it does, then parse as instant, and if it fails, then try to parse with the abovementioned formatter.

MC Emperor
  • 22,334
  • 15
  • 80
  • 130
1

The brute force approach:

  • simply try your 4 formats, one after the other to parse the incoming string
  • if parsing throws an exception, try the next one
  • if parsing passes, well, that format just matched

Of course, if we are talking about larger tables, that is quite inefficient. Possible optimisations:

  • obviously, the different patterns have subtle differences, so you could use indexOf() checks first. Like: if the value to be parsed contains no ':' char, then it can only be the first pattern.
  • you can look at your data manually to figure the actual distribution of patterns that are used. then you adapt the order of patterns to try to the likelihood of the pattern being used in your data

Alternatively: you could define your own regex. The only thing that makes it slightly ugly is the fact that your input uses month names, not month number. But I think it shouldn't be too hard to write up a single regex that covers all your cases.

GhostCat
  • 137,827
  • 25
  • 176
  • 248