0

I have a requirement to parse external CSV files and read their name attributes. I am using opencsv library to achieve this, please find the test code below. It works pretty well with valid CSV files, however, if one of the rows is invalid, there is no way to handle that error. I shared an example CSV below with an error case, inside which the escaped double quote is causing the problem in java. Could we somehow parse this inline or at the file level and replace \" with ".

    @Test
    public void csvTest() throws IOException {
        String fileName = "ERROR.csv";
        File file = new File("D:\\csvFiles\\" + fileName);
        if (file.exists()) {

            CSVReader csvReader = new CSVReader(new FileReader("D:\\csvFiles\\" + fileName));
            String[] nextLine;
            int row = 0;
            while ((nextLine = csvReader.readNext()) != null) {
                row++;
                if (nextLine.length > 0) {
                    System.out.println("ROW: " + row + " " + String.join(",", nextLine));
                }
            }

        }
    } 

ERROR.csv

id,name,address,phone
"1","Bob","New Jersey","9999999999"
"2","Smith","Sydney ///\","9999999999"

Note: When we open this csv file in the excel app, then it renders perfectly, so is it only in the java world that is treating it erroneously, because a double quote has been escaped with the preceding backslash (\")?

enter image description here

MWiesner
  • 8,868
  • 11
  • 36
  • 70
JavaCodeNet
  • 1,115
  • 1
  • 15
  • 21
  • The best plan is to have it notify you of the exceptions so you can fix them by hand. Trying to fix the stupidities that people put into CSV files automatically requires some advanced AI. – Tim Roberts Jan 09 '23 at 19:42
  • So you want to get rid of escape characters in loaded CSV data? Consider loading the file to a string, and using ‘String.replace’ to flush the extra escape characters out of the string? From there, you can split the string into lines using split and cycle through the lines as you otherwise would. https://stackoverflow.com/questions/12423071/how-to-remove-escape-characters-from-a-string-in-java – Jax Jan 09 '23 at 20:36
  • `CsvParser` should allow you to process one line first and then parse it – g00se Jan 09 '23 at 21:51
  • thank you @g00se, if you have it handy could you share an example or a documentation reference to achieve this? If there is an issue in parsing I would also like to have a fallback to adjust the line and do re-parsing again. – JavaCodeNet Jan 10 '23 at 13:06
  • I don't I'm afraid. But look at the Javadoc. Should be simple enough – g00se Jan 10 '23 at 15:04

1 Answers1

1

A customized CSVReader instance works for me; see code below:

CSVParserBuilder pb = new CSVParserBuilder();
CSVParser p = pb.withIgnoreLeadingWhiteSpace(true)
        .withEscapeChar('%')
        .withSeparator(',')
        .build();
CSVReaderBuilder rb = new CSVReaderBuilder(new FileReader(file));
rb.withCSVParser(p);
CSVReader csvReader = rb.build();

String[] nextLine;
int row = 0;
while ((nextLine = csvReader.readNext()) != null) {
  row++;
  if (nextLine.length > 0) {
    System.out.println("ROW: " + row + " " + String.join(",", nextLine));
  }
}

Note: I set a different escape character with .withEscapeChar('%'). You could choose any special character different from \ of which you know that it has no actual meaning in your data.

Given such a customized CSVParser, the configured CSVReader instance works just fine with your csv data provided in the OP.

It produces

ROW: 1 id,name,address,phone
ROW: 2 1,Bob,New Jersey,9999999999
ROW: 3 2,Smith,Sydney ///\,9999999999

as (expected) output without any errors.

I used OpenCSV in version 5.7.x

MWiesner
  • 8,868
  • 11
  • 36
  • 70