2

I am involved in a project that requires me to parse strings into dates. The dates that we get are not in any single well-defined format.

Example: Variable spaces in between date fields, both single digit and multiple digit date fields, missing date fields like missing time or optionally present milliseconds or zone offset etc.

By Date Fields I mean: Day, Month, Year, Hour, Minutes, Seconds, Milliseconds, zone offset, time zone etc.

Some sample inputs:

"2014    :11 :01 00 :49" 
"2015-08-25T00:02:40Z" 
"2016/6/2 19:16:29" 
"2017:10:27 18 :08: 9" 
"2016-04-29T16:10:48 .80+00:00"
"2017:02:11 9:26:16 a. m."
"2017-12-16T08:04:17####"

I decided to use the DateTimeFormatter builder to create the formatter with multiple date patterns.

I was wondering if there is any easy simpler way or a library that does similar fuzzy matching/parsing of strings to Date.

userboi
  • 27
  • 3
John Doe
  • 21
  • 1
  • 2
    Hello and welcome to StackOverflow. Please take some time to read the help page, especially the sections named ["What topics can I ask about here?"](http://stackoverflow.com/help/on-topic) and ["What types of questions should I avoid asking?"](http://stackoverflow.com/help/dont-ask). And more importantly, please read [the Stack Overflow question checklist](http://meta.stackexchange.com/q/156810/204922). You might also want to learn about [Minimal, Complete, and Verifiable Examples](http://stackoverflow.com/help/mcve). – lexicore Apr 12 '18 at 11:03
  • 1
    Why your question is not quite good: you are asking to recommend a library. This is directly an off-topic. And your question will be closed very soon. – lexicore Apr 12 '18 at 11:06
  • I don’t think any library specifically targetted for fuzzy date-time formats exists. It sounds like you have already found your best bet, and if not, just keep in searching Stack Overflow and Internet. – Ole V.V. Apr 12 '18 at 11:07
  • It lloks like the order of fields is always year, month, day-of-month. hour, minute, then optional second, optional fraction of second, optional AM/PM marker (if not 24 hour format) and optional offset (z or plus or minus hours and minutes). That shouldn’t be too bad. Check for presence of letters z, a, p and m . Use a regular expression to take out as many numeric fields as possible. Put the pieces together. Finally check that there is plus or minus before any offset (except z) and that any t comes between day and hours. – Ole V.V. Apr 12 '18 at 11:12
  • Possible duplicate of [Java string to date conversion](https://stackoverflow.com/questions/4216745/java-string-to-date-conversion) – Basavaraj Bhusani Apr 12 '18 at 11:36
  • 2
    Not really, @BasavarajBhusani. That question isn’t concerned at all with the fuzzy formats that are at the center of this question. – Ole V.V. Apr 12 '18 at 12:49

3 Answers3

2

While there's no clear-cut way to parse such vague and random input formats, you could use regex to extract the actual date, if not the time in hours and minutes.

You could import the necessary classes with import java.util.regex.*; and try this in your main() method:

String input = "2014    :11 :01 00 :49"; // Or whatever the input is
String regex = "(\\d+)";
Matcher m = Pattern.compile(regex).matcher(input);

int year = 0, month = 0, date = 0;
if(m.find())
{
    year = Integer.parseInt(m.group(1));
}
if(m.find())
{
    month = Integer.parseInt(m.group(1));
}
if(m.find())
{
    date = Integer.parseInt(m.group(1));
}
System.out.println(year+":"+month+":"+date);

Where (\\d+) would give every next one-or-more digit number when m.find() is called.

Which would give 2014:11:1, which you could then parse.

However, it seems impossible to extract the date with the random input formats.

userboi
  • 27
  • 3
  • I will give this approach a try. Let me see to what extent I can leverage this versus my current approach. Thanks for the answer. – John Doe Apr 12 '18 at 12:11
0

You can split the string using any non-digit characters as separators:

String[] parts = input.split("\\D+");

Based on the number of items (parts.length) you can know how many fields there are (if length is 3, you know it has only year, month and day, and so on).

Then you make some extra steps to check AM/PM strings and offsets (Z, +01:00, -03:00, etc).

To validate all the fields, I'd try to create some date/time types to make sure all the fiels are valid. Example:

// the constructors take nanoseconds as parameter (not milliseconds)
int nanos = ms * 1000000;

// only day, month and year, try to create a LocalDate
LocalDate.of(year, month, day);

// only hours and no offset, try to create a LocalTime
LocalTime.of(hour, min, sec, nanos);

// day, month and year, hours and no offset, try to create a LocalDateTime
LocalDateTime.of(year, month, day, hour, min, sec, nanos);

// *** Don't forget to adjust the hour value when AM/PM is found ***

// when an offset is found, try to create an OffsetDateTime
// offset accepts strings like "Z", "+01:00" or "-03:00"
ZoneOffset offset = ZoneOffset.of(offsetString);
OffsetDateTime.of(year, month, day, hour, min, sec, nanos, offset);

If the values are invalid (day zero, Feb 29th in non-leap year, etc), the methods above will throw an exception.

jonpebs
  • 70
  • 3
0

I once had a task where I needed to convert a String to Date where date format was not known in advance. In general the task was to take a String and if it is a Date convert it to date. I wrote such a code but didn't publish it as an Open Source library. However I wrote an article with detailed description of the idea. Here is the link to the article: Java 8 java.time package: parsing any string to date

In short the idea is to put into a property file all the date formats that you want to support, and then take those formats one by one and see if your String fits it. Order of the formats is significant as sometimes String can fit more then one format (US and european), so you would need to deside which ones are preferable and place them before others. In any case look up the article if you choose or will be forced due to the lack of libraryto write your own code

Michael Gantman
  • 7,315
  • 2
  • 19
  • 36
  • Our approach is very similar. I did not try the pattern mentioned in the article, I'll check that. I am currently facing some trouble with zoneoffset , e.g I am able to parse +08:00 but not +8:00 as I am not able to find a pattern that works for this. Maybe a bit of regex match and replace will work here. – John Doe Apr 13 '18 at 06:38
  • At first galnce "+8:00" format is not supportedd as oppose to "+08:00", but I am not totally sure if it is true – Michael Gantman Apr 15 '18 at 07:51