0

I have a large xlsx file (74 Mbyte). I have found a way to read it in. Here is my source code so far.

import java.io.File;

import java.io.FileInputStream;
import java.io.IOException;
import java.util.Iterator;

import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;


private static void readXLSX(String path) throws IOException {
    File myFile = new File(path);
    FileInputStream fis = new FileInputStream(myFile);

    // Finds the workbook instance for XLSX file
    XSSFWorkbook myWorkBook = new XSSFWorkbook (fis);

    // Return first sheet from the XLSX workbook
    XSSFSheet mySheet = myWorkBook.getSheetAt(0);

    // Get iterator to all the rows in current sheet
    Iterator<Row> rowIterator = mySheet.iterator();

    // Traversing over each row of XLSX file
    while (rowIterator.hasNext()) {
        Row row = rowIterator.next();

        // For each row, iterate through each columns
        Iterator<Cell> cellIterator = row.cellIterator();
        while (cellIterator.hasNext()) {

            Cell cell = cellIterator.next();

            switch (cell.getCellType()) {
            case Cell.CELL_TYPE_STRING:
                System.out.print(cell.getStringCellValue() + "\t");
                break;
            case Cell.CELL_TYPE_NUMERIC:
                System.out.print(cell.getNumericCellValue() + "\t");
                break;
            case Cell.CELL_TYPE_BOOLEAN:
                System.out.print(cell.getBooleanCellValue() + "\t");
                break;
            default :

            }
        }
        System.out.println("");
    }
}

The problem is that my 8 GByte Ram doesn't seem to be sufficient, even using swapping and extending the JVM memory.

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded

Do You have any idea why this code is so inefficient? Or maybe You have an idea how to read this code sequentially and buffer the temporary rows in a less memory consuming way? Thanks in advance

Jürgen K.
  • 3,427
  • 9
  • 30
  • 66
  • https://stackoverflow.com/questions/4897766/processing-large-xlsx-file-in-java – jsheeran Jun 27 '17 at 08:33
  • You shouldn't print anything of the xlsx file to the console, I had this problem too. When you print something from an xslx file you never know the size of it. this can cost much memory. Processing the information shouldn't be an problem – Casper Jun 27 '17 at 08:38
  • @Casper: I cut all the print outs, still the same problem – Jürgen K. Jun 27 '17 at 09:02
  • @JürgenK. Weird I also use org.apache.poi.ss.usermodel, nut except for the XSSFSheet and XSSFWorkbook I just use the Sheet and Workbook classes without the XSSF. I had never any trouble with big files except when I print something to the console – Casper Jun 27 '17 at 09:33

1 Answers1

0

Using XSSF version of Poi is known to cause memory issues. You can use the streaming alternative, this will ensure you wont run out of memory.

In short, use this alternative

SXSSFWorkbook instead of XSSFWorkbook

API details here

Johnson Abraham
  • 771
  • 4
  • 12