I have data using year and day-of-year (1-365/366) plus a time-of-day, such as 2018-338T14:02:57.47583
, rather than year-month-day.
I am trying to write a function where I input a timestamp, it runs it through a bunch of regexes, and returns a pattern that I can use to parse that timestamp via LocalDateTime.parse. I have a function that does this except with one issue.
public static String getInputFormat(String rawFirst) {
String first = rawFirst.replaceAll("[^-.:/ a-zA-Z0-9]", "");
ImmutableMap<String, String> regexFormatMap = new ImmutableMap.Builder<String, String>()
.put("(-)?[0-9]{4}(Z?)", "YYYY")
.put("(-)?[0-9]{4}-((00[1-9])|(0[1-9][0-9])|([1-2][0-9][0-9])|(3(([0-5][0-9])|(6[0-6]))))(T)(([0-1][0-9])|(2[0-3])):[0-5][0-9](Z?)", "yyyy-DDD'T'HH:mm")
.put("(-)?[0-9]{4}-((00[1-9])|(0[1-9][0-9])|([1-2][0-9][0-9])|(3(([0-5][0-9])|(6[0-6]))))(T)(([0-1][0-9])|(2[0-3])):[0-5][0-9]:(([0-5][0-9])|60)(\\.([0-9]{1,6}))?(Z?)", "yyyy-DDD'T'HH:mm:ss.SSSSS")
.put("(-)?[0-9]{4}-((00[1-9])|(0[1-9][0-9])|([1-2][0-9][0-9])|(3(([0-5][0-9])|(6[0-6]))))(T)(([0-1][0-9])|(2[0-4]))(Z?)", "yyyy-DDD'T'HH")
.put("(-)?[0-9]{4}-((00[1-9])|(0[1-9][0-9])|([1-2][0-9][0-9])|(3(([0-5][0-9])|(6[0-6]))))(T)24((:00)|(:00:00))?(Z?)", "yyyy-DDD'T'HH:mm:ss")
.put("(-)?[0-9]{4}-((00[1-9])|(0[1-9][0-9])|([1-2][0-9][0-9])|(3(([0-5][0-9])|(6[0-6]))))(T)24:00:00(\\.([0]{1,6}))(Z?)", "yyyy-DDD'T'HH:mm:ss")
.put("(-)?[0-9]{4}-((00[1-9])|(0[1-9][0-9])|([1-2][0-9][0-9])|(3(([0-5][0-9])|(6[0-6]))))(Z?)", "yyyy-DDD")
.build();
for (String regex : regexFormatMap.keySet()) {
System.out.println(first.matches(regex) + ": " + first + " fits " + regex);
if (first.matches(regex)) {
System.out.println("Returning pattern " + regexFormatMap.get(regex));
return regexFormatMap.get(regex);
}
}
System.out.println("did not match pattern, returning default pattern used with test data, eventually this should just fail.");
return "yyyy-DDD'T'HH:mm:ss.SSSSS";
}
I can't figure out how to handle an arbitrary number of subsecond digits, i.e.,
"2018-338T14:02:57.47583"
"2018-338T14:02:57.475835"
"2018-338T14:02:57.4758352"
"2018-338T14:02:57.47583529"
etc.
I want to do this in as general a way as possible, so ideally I wouldn't be checking for each possibility.
One solution would be to have the output format string have nine subsecond digits, and then pad the input string, but the problem is it's getting quite clunky to check whether I should pad it and by how much. I want this to handle a wide variety of strings, and to be expandable just by adding more entries to the regex map instead of adding complexity and special cases elsewhere.
Maybe I can't get everything I want here, but I'd love a solution if you can think of one. Thanks!