0

I use the following code to read really large excel files - up to 500k lines with up to 100 columns

public List<ExcelLine> getExcelLines(Path path, int batchSize) {

    log.info("Reading excel file");

    try (Workbook workbook = WorkbookFactory.create(path.toFile())) {
      Sheet sheet = workbook.getSheetAt(0);
      ...
    }
}

The idea was to read by batches to avoid OutOfMemory error and it works for small files. However, once I call the method with large file, I see log.info statement, but then the application is in stuck. So a breakpoint inside the try-with-resources statement is not reachable.

Does anybody have ideas what to do in such case?

Alexey
  • 362
  • 1
  • 5
  • 17

1 Answers1

1

The most popular suggestion didn't work for me because it hasn't been updated for 2 years and has conflicts with the latest Apache POI implementation. I found another repo, that is essentially a fork of the original one but has updates: https://github.com/pjfanning/excel-streaming-reader. This is the code that works for me:

try (InputStream inputStream = Files.newInputStream(file.getFileLocation());
     Workbook workbook = StreamingReader.builder().setAvoidTempFiles(false).bufferSize(4096).open(inputStream)) {
...
}
Alexey
  • 362
  • 1
  • 5
  • 17