2

I had a requirement to parser very large excel files using apache poi with limit memory. After googling, I came to know that poi provides SAX parser to parser large file effectively without consuming lot of memory.

Apache POI SAX Parser example

  private class SheetToCSV implements SheetContentsHandler {
    private boolean firstCellOfRow = false;
    private int currentRow = -1;
    private int currentCol = -1;

    private void outputMissingRows(int number) {
        for (int i=0; i<number; i++) {
            for (int j=0; j<minColumns; j++) {
                output.append(',');
            }
            output.append('\n');
        }
    }

    @Override
    public void startRow(int rowNum) {
        // If there were gaps, output the missing rows
        outputMissingRows(rowNum-currentRow-1);
        // Prepare for this row
        firstCellOfRow = true;
        currentRow = rowNum;
        currentCol = -1;
    }

    @Override
    public void endRow(int rowNum) {
        // Ensure the minimum number of columns
        for (int i=currentCol; i<minColumns; i++) {
            output.append(',');
        }
        output.append('\n');
    }

    @Override
    public void cell(String cellReference, String formattedValue,
            XSSFComment comment) {
        if (firstCellOfRow) {
            firstCellOfRow = false;
        } else {
            output.append(',');
        }

        // gracefully handle missing CellRef here in a similar way as XSSFCell does
        if(cellReference == null) {
            cellReference = new CellAddress(currentRow, currentCol).formatAsString();
        }

        // Did we miss any cells?
        int thisCol = (new CellReference(cellReference)).getCol();
        int missedCols = thisCol - currentCol - 1;
        for (int i=0; i<missedCols; i++) {
            output.append(',');
        }
        currentCol = thisCol;

        // Number or string?
        try {
            Double.parseDouble(formattedValue);
            output.append(formattedValue);
        } catch (NumberFormatException e) {
            output.append('"');
            output.append(formattedValue);
            output.append('"');
        }
    }

    @Override
    public void headerFooter(String text, boolean isHeader, String tagName) {
        // Skip, no headers or footers in CSV
    }
}

In the example provided in the above link, the method 'cell' has access only to formatted value however I need to access the actual value of the cell.

Arul
  • 75
  • 1
  • 11

1 Answers1

2

The current implementation of the streaming interface does not provide this. So in order to achieve this you will need to copy the code of the underlying XSSFSheetXMLHandler and adjust it so that the cell-content is not formatted.

centic
  • 15,565
  • 9
  • 68
  • 125