0

I want some help to be able to parse out String data out of a .csv file as dynamically/flexible as possible, meaning the user can enter a bunch of different types of formats (i.e. I want to handle dd-MMM-yyyy but also yyyy-MM-dd and more if possible) of dates, or datetimes, and I should be able to parse without throwing exceptions or crashing. The current format for the date/datetime fields of the .csv files is dd-MMM-yyyy so something like 30-Apr-2020. Of course, time can be added and is optional (as seen by the pattern uses [ ] bracket notation, so that would be 30-Apr-2020 23:59:59). I already have set up the parsing of the date/datetime columns as such:

DateTimeFormatter dtf = new DateTimeFormatterBuilder()
             .appendPattern("dd-MMM-yyyy[[ ]['T']HH:mm:ss]")
             .optionalStart()
             .appendFraction(ChronoField.MICRO_OF_SECOND, 1, 6, true)
             .optionalEnd()
             .toFormatter();

 TemporalAccessor temporalAccessor = dtf.parseBest(dateString, LocalDateTime::from, LocalDate::from);
                if (temporalAccessor instanceof LocalDateTime) {
                    // process here
                } else if (temporalAccessor instanceof LocalDate) {
                    // process here
                }

So, basically by setting up the pattern to be flexible i.e. "dd-MMM-yyyy[[ ]['T']HH:mm:ss]", I then check using the TemporalAccessor whether its a date or date-time and do further processing as needed. I can process many different types of input and not have the app throw an exception here and fail. So I can consume:

01-Sep-2020 // just date
01-Sep-2099 18:59:59 // datetime
01-Apr-2033 18:59:59.123 // datetime with ms
01-Aug-2057 23:59:59.123456 // date time up to 6 ms decimal pts

However, if the user .csv contains something like 2020-05-30 date, which I believe is the ISO format standard, it will fail. Also, something bad I just noticed now, is the .parseBest() method, also fails because its case-sensitive on the month, so something like this i.e. 01-MAY-1999 fails but 01-May-1999 passes.

How can I handle the most different types of formats without failing on parsing? As I said, I don't actually generate the .csv files (that is the Data Engineers) so I want this app to be robust/flexible as possible and be able to parse this data/correctly format it so the data can be consumed and written to the database accordingly. I thought my approach here was decent, so I was hoping a huge re-write was not needed.

Arvind Kumar Avinash
  • 71,965
  • 6
  • 74
  • 110
ennth
  • 1,698
  • 5
  • 31
  • 63

1 Answers1

2

You can use DateTimeFormatterBuilder#parseDefaulting to default the optional fields as shown in the example below:

import java.time.LocalDateTime;
import java.time.format.DateTimeFormatter;
import java.time.format.DateTimeFormatterBuilder;
import java.time.temporal.ChronoField;
import java.util.Locale;

public class Main {
    public static void main(String[] args) {
        DateTimeFormatter dtfInput = new DateTimeFormatterBuilder()
                                .parseCaseInsensitive()// For case-insensitive parsing
                                .appendPattern("[d-M-uuuu[ H[:m[:s]]]]")
                                .appendPattern("[uuuu-M-d[ H[:m[:s]]]]")
                                .appendPattern("[uuuu/M/d[ H[:m[:s]]]]")
                                .appendPattern("[d/M/uuuu[ H[:m[:s]]]]")
                                .appendPattern("[d-MMM-uuuu[ H[:m[:s[.SSSSSS]]]]]")
                                .parseDefaulting(ChronoField.HOUR_OF_DAY, 0)
                                .parseDefaulting(ChronoField.MINUTE_OF_HOUR, 0)
                                .parseDefaulting(ChronoField.SECOND_OF_MINUTE, 0)
                                .parseDefaulting(ChronoField.NANO_OF_SECOND, 0)
                                .toFormatter(Locale.ENGLISH);

        String[] arr = { 
                                "10-5-2020", 
                                "2020-5-10", 
                                "10/5/2020", 
                                "2020/5/10", 
                                "10-5-2020 10:20:30", 
                                "10-5-2020 10",
                                "10-5-2020 10:20", 
                                "10/5/2020 10:20", 
                                "01-May-1999", 
                                "01-MAY-1999", 
                                "01-Aug-2057 23:59:59.123456"
                        };

        for (String dt : arr) {
            System.out.println(LocalDateTime.parse(dt, dtfInput));
        }
    }
}

Output:

2020-05-10T00:00
2020-05-10T00:00
2020-05-10T00:00
2020-05-10T00:00
2020-05-10T10:20:30
2020-05-10T10:00
2020-05-10T10:20
2020-05-10T10:20
1999-05-01T00:00
1999-05-01T00:00
2057-08-01T23:59:59.123456
Arvind Kumar Avinash
  • 71,965
  • 6
  • 74
  • 110
  • this looks like what I needed. Why is the entire patterns optional though? (they are all enclosed in brackets) Shouldn't the date be outside the brackets as its always madatory? Also, is it even possible to have the [MM/dd/uuuu[ H:m:s]] format, like USA time? I tried and it throws an exception because its the comiler is getting confused with the other variant, i.e. [d/M/u[ H:m:s]] – ennth Feb 08 '21 at 04:50
  • @ennth - `Why is the entire patterns optional though? (they are all enclosed in brackets) Shouldn't the date be outside the brackets as its always madatory?` - It's because there are multiple date patterns as well. – Arvind Kumar Avinash Feb 08 '21 at 07:48
  • 1
    @ennth - `Also, is it even possible to have the [MM/dd/uuuu[ H:m:s]] format, like USA time? I tried and it throws an exception because its the comiler is getting confused with the other variant, i.e. [d/M/u[ H:m:s]]` - It's because the runtime engine, like a human being, will not be able to determine which one, out of `dd/MM/uuuu` and `MM/dd/uuuu`, to choose in case of date like `05/10/2020`. – Arvind Kumar Avinash Feb 08 '21 at 08:37