10

I'm trying to write a large number of rows (~2 million) from a database to a CSV file using SuperCSV. I need to perform validation on each cell as it is written, and the built-in CellProcessors do very nicely. I want to capture all the exceptions that are thrown by the CellProcessors so that I can go back to the source data and make changes.

The problem is that when there are multiple errors in a single row (e.g. The first value is out of range, the second value is null but shouldn't be), only the first CellProcessor will execute, and so I'll only see one of the errors. I want to process the whole file in a single pass, and have a complete set of exceptions at the end of it.

This is the kind of approach I'm trying:

for (Row row : rows) {
    try {
        csvBeanWriter.write(row, HEADER_MAPPINGS, CELL_PROCESSORS);
    } catch (SuperCsvCellProcessorException e) {
        log(e);
    }
}

How can I achieve this? Thanks!

EDIT: Here is the code I wrote that's similar to Hound Dog's, in case it helps anyone:

import java.util.List;

import org.supercsv.cellprocessor.CellProcessorAdaptor;
import org.supercsv.cellprocessor.ift.CellProcessor;
import org.supercsv.exception.SuperCsvCellProcessorException;
import org.supercsv.util.CsvContext;

public class ExceptionCapturingCellProcessor extends CellProcessorAdaptor {

    private final List<Exception> exceptions;

    private final CellProcessor current;

    public ExceptionCapturingCellProcessor(CellProcessor current, CellProcessor next, List<Exception> exceptions) {
        super(next);
        this.exceptions = exceptions;
        this.current = current;
    }

    @Override
    public Object execute(Object value, CsvContext context) {
        // Check input is not null
        try {
            validateInputNotNull(value, context);
        } catch (SuperCsvCellProcessorException e) {
            exceptions.add(e);
        }

        // Execute wrapped CellProcessor
        try {
            current.execute(value, context);
        } catch (SuperCsvCellProcessorException e) {
            exceptions.add(e);
        }

        return next.execute(value, context);
    }
}
Conan
  • 2,288
  • 1
  • 28
  • 42
  • 1
    Just be aware that you couldn't use this processor with `ConvertNullTo` - you don't need to validate the input's not null. And Super CSV convention is to put `next` as the last parameter in the constructor (as it reads better in a long chain). – James Bassett Dec 03 '12 at 20:21

1 Answers1

11

I'd recommend writing a custom CellProcessor to achieve this. The following processor can be placed at the start of each CellProcessor chain - it will simply delegate to the processor chained after it, and will suppress any cell processing exceptions.

package example;

import java.util.ArrayList;
import java.util.List;
import org.supercsv.cellprocessor.CellProcessorAdaptor;
import org.supercsv.cellprocessor.ift.CellProcessor;
import org.supercsv.exception.SuperCsvCellProcessorException;
import org.supercsv.util.CsvContext;

public class SuppressException extends CellProcessorAdaptor {

    public static List<SuperCsvCellProcessorException> SUPPRESSED_EXCEPTIONS = 
            new ArrayList<SuperCsvCellProcessorException>();

    public SuppressException(CellProcessor next) {
        super(next);
    }

    public Object execute(Object value, CsvContext context) {
        try {
            // attempt to execute the next processor
            return next.execute(value, context);

        } catch (SuperCsvCellProcessorException e) {
            // save the exception
            SUPPRESSED_EXCEPTIONS.add(e);

            // and suppress it (null is written as "")
            return null;
        }
    }
}

And here it is in action:

package example;

import java.io.StringWriter;
import java.util.Arrays;
import java.util.List;

import org.supercsv.cellprocessor.constraint.NotNull;
import org.supercsv.cellprocessor.constraint.StrMinMax;
import org.supercsv.cellprocessor.ift.CellProcessor;
import org.supercsv.exception.SuperCsvCellProcessorException;
import org.supercsv.io.CsvBeanWriter;
import org.supercsv.io.ICsvBeanWriter;
import org.supercsv.prefs.CsvPreference;

public class TestSuppressExceptions {

    private static final CellProcessor[] PROCESSORS = {
            new SuppressException(new StrMinMax(0, 4)),
            new SuppressException(new NotNull()) };

    private static final String[] HEADER = { "name", "age" };

    public static void main(String[] args) throws Exception {

        final StringWriter stringWriter = new StringWriter();
        ICsvBeanWriter beanWriter = null;
        try {
            beanWriter = new CsvBeanWriter(stringWriter,
                    CsvPreference.STANDARD_PREFERENCE);

            beanWriter.writeHeader(HEADER);

            // set up the data
            Person valid = new Person("Rick", 43);
            Person nullAge = new Person("Lori", null);
            Person totallyInvalid = new Person("Shane", null);
            Person valid2 = new Person("Carl", 12);
            List<Person> people = Arrays.asList(valid, nullAge, totallyInvalid,
                    valid2);

            for (Person person : people) {
                beanWriter.write(person, HEADER, PROCESSORS);

                if (!SuppressException.SUPPRESSED_EXCEPTIONS.isEmpty()) {
                    System.out.println("Suppressed exceptions for row "
                                        + beanWriter.getRowNumber() + ":");
                    for (SuperCsvCellProcessorException e :
                                        SuppressException.SUPPRESSED_EXCEPTIONS) {
                        System.out.println(e);
                    }
                    // clear ready for next row
                    SuppressException.SUPPRESSED_EXCEPTIONS.clear();
                }

            }

        } finally {
            beanWriter.close();
        }

        // CSV will have empty columns for invalid data
        System.out.println(stringWriter);

    }

}

Here's the suppressed exceptions output (row 4 has two exceptions, one for each column):

Suppressed exceptions for row 3:
org.supercsv.exception.SuperCsvConstraintViolationException: null value 
encountered processor=org.supercsv.cellprocessor.constraint.NotNull
context={lineNo=3, rowNo=3, columnNo=2, rowSource=[Lori, null]}
Suppressed exceptions for row 4:
org.supercsv.exception.SuperCsvConstraintViolationException: the length (5) 
of value 'Shane' does not lie between the min (0) and max (4) values (inclusive)
processor=org.supercsv.cellprocessor.constraint.StrMinMax
context={lineNo=4, rowNo=4, columnNo=2, rowSource=[Shane, null]}
org.supercsv.exception.SuperCsvConstraintViolationException: null value 
encountered processor=org.supercsv.cellprocessor.constraint.NotNull
context={lineNo=4, rowNo=4, columnNo=2, rowSource=[Shane, null]}

And the CSV output

name,age
Rick,43
Lori,
,
Carl,12

Notice how the invalid values were written as "" because the SuppressException processor returned null for those values (not that you'd use the CSV output anyway, as it's not valid!).

James Bassett
  • 9,458
  • 4
  • 35
  • 68
  • 4
    If anyone would find something like this useful, I'll add this in an upcoming release so you don't have to write your own :) – James Bassett Dec 01 '12 at 10:27
  • 1
    sweet! but the row is still written. I think dedicated hooks for exception handling may be a better approach – Carlo V. Dango Dec 01 '12 at 15:19
  • That's really helpful! I'd actually looked into doing the same things myself, but gave up because I wanted still to reject the row that has errors, rather than leaving an empty row in there. I was hoping to hack in something that could keep tabs on the exceptions, and throw a wrapper exception if any were found, but each CellProcessor doesn't have visibility of the others, so I couldn't see how to do it. – Conan Dec 03 '12 at 15:33
  • I've added the code I wrote to the question, in case it helps. – Conan Dec 03 '12 at 15:38
  • 1
    And yes, please add it to a release - the person who recommended SuperCSV to me also came across this! – Conan Dec 03 '12 at 15:40
  • Ok @Conan I'll see what I can do :) – James Bassett Dec 03 '12 at 21:44
  • 1
    Thank you for this. I made it a static map so that it could save the column number of the internal error: catch (SuperCsvCellProcessorException e) { // save the exception CELL_EXCEPTIONS.put(e.getCsvContext().getColumnNumber(), e); – Akber Choudhry Jun 13 '13 at 00:42
  • 1
    @AkberChoudhry The alternative would be to replace the `CsvContext` inside the exception with a copy before saving it (the context is reused, so as you probably found out, all your exceptions will reference the same context if you don't do this). – James Bassett Jun 13 '13 at 01:18