14

I am using OpenCSV's CSVReader to read some comma separated values from a file. I'm not sure how to trim leading and trailing spaces. Sure, I could do String.trim() but it would be cleaner not to. In the documentation there is no such option specified.

syb0rg
  • 8,057
  • 9
  • 41
  • 81
user1377000
  • 1,433
  • 3
  • 17
  • 29
  • How is it not "cleaner" to use `String.trim()`? – syb0rg Mar 15 '13 at 18:01
  • Because I have to write one extra line. Also, it has to create an entire new object, so a bit less efficient. – user1377000 Mar 15 '13 at 18:03
  • 3
    You can't spare **1** extra line in your source code for a function that you know how to use? – syb0rg Mar 15 '13 at 18:05
  • 2
    I agree that it would be nice if the CSVReader object had an option for this. There is an 'ignoreLeadingWhiteSpace' option on the constructor but I guess it only affects spaces outside of the quotes? – Leo Lansford Jul 30 '13 at 16:09
  • I think a CSV library should put the content into the cells, and read cells contents **exactly** as they are. Its responsibility ends there. It is the developers responsibility to use the right content to write, and transform the content read. I'm sure you could create a class, wrapping CSVReader, that would trim() all fields, and then your code doing the business logic would be cleaner. – ppeterka Sep 13 '13 at 08:17
  • For those who are reading .csv with CsvToBeanBuilder. I trimmed the string on setter method of CSV binding property. – Imam Bux Aug 15 '21 at 20:57

4 Answers4

4

If you are working with bean mapping and OpenCSV, I personally prefer to extend the MappingStrategy as it handles the final value assignments to their related fields. Imagine your fields are tab separated. Then you might have hard time to extend the CSVReader. Also, less coding is required.

In the following example, I am using a ColumnPositionMappingStrategy but yours can be any other MappingStrategy as populateNewBean is in the parent abstract class.

private <T> MappingStrategy<T> createMappingStrategy() {
    return new ColumnPositionMappingStrategy<T>() {
        @Override
        public T populateNewBean(String[] line) throws CsvDataTypeMismatchException, CsvConstraintViolationException,
                CsvRequiredFieldEmptyException, CsvValidationException {
            Arrays.setAll(line, (i) -> line[i].trim());
            return super.populateNewBean(line);
        }
    };
}

As you can see, each field/line is trimmed before bean is populated.

Youness
  • 1,920
  • 22
  • 28
3

Can you switch to SuperCSV? It has an option to ignore surrounding spaces on its CsvPreference.Builder. It's a far superior library, IMO. If that preference doesn't do what you want, you could always extend the Tokenizer class and override readColumns. Otherwise, it looks like OpenCSV isn't very granular and would require you to extend CSVReader and override readNext. This might work:

class MyReader extends au.com.bytecode.opencsv.CSVReader {
    @Override public String[] readNext() throws IOException {
        String[] result = super.readNext();
        for (int i=0; i<result.length; i++) result[i] = result[i].trim();
        return result;
    }
}
ngreen
  • 1,559
  • 13
  • 22
  • 2
    Note that SuperCSV was most recently updated in 2015. OpenCSV is currently maintained. – Andrew Apr 27 '20 at 17:14
  • That's unfortunate. I haven't kept up with the OpenCSV API changes, so I don't know how much improved it is. Certainly `java.time` support is a big deal. The fact that OpenCSV is willing to make breaking changes on major releases is a good sign. – ngreen Apr 28 '20 at 17:14
1

Using ngreen's idea I came up with the following working solution:

public class CSVReaderExtended extends CSVReader {

    private static final String EXP_ALPHA_AND_DIGITS = "[^a-zA-Z0-9]+";

    public CSVReaderExtended(Reader reader) {
        super(reader);
    }

    @Override
    public String[] readNext() throws IOException {
        String[] result = super.readNext();
        if (result == null)
            return null;

        for (int index = 0; index < result.length; index++) {
            result[index] = result[index].replaceAll(EXP_ALPHA_AND_DIGITS, "");
        }
        return result;
    }
}
0

I ended up extending CsvParser to do this

CSVParser parser = new CSVParser() {
    @Override
    protected String[] parseLine(String nextLine, boolean multi) throws IOException {
        String[] line = super.parseLine(nextLine, multi);
        Arrays.setAll(line, i -> line[i].trim());
        return line;
    }
};
MappingStrategy<R> mappingStrategy = new HeaderColumnNameMappingStrategy<>();
mappingStrategy.setType(rowType);
Reader reader = new FileReader('c:/path/to/file.csv');
CSVReader csvReader = new CSVReaderBuilder(reader).withCSVParser(csvParser).build();
CsvToBean<R> csvToBean = new CsvToBeanBuilder<R>(csvReader).withMappingStrategy(mappingStrategy).build();
lance-java
  • 25,497
  • 4
  • 59
  • 101