2

what is the fastest and less memory intensive way to read a portion of very large xlsx file?

Currently I have this code:

FileInputStream fis = null;
fis = new FileInputStream("D:/verylargefile.xlsx");

XSSFWorkbook workbook = new XSSFWorkbook(fis);
XSSFSheet sheet = workbook.getSheetAt(0);

int r = sheet.getPhysicalNumberOfRows();    
int c = sheet.getRow(1).getLastCellNum(); 

for (int row = 1; rows < r;row++){
   for (int cell = 1; cell < c;cell++){
       int cellvalue = (int)sheet.getRow(row).getCell(cell).getNumericCellValue()
       //do some simple math op with that cell or several cells

   }
}

So I need to do very large number of those simple math operations (for example average of every 5 cells in every row or something simillar) and very fast, with a small part of a very large xlsx file at once. With code above, I am getting heap space error with 10mb xlsx file and 1gb ram dedicated to java vm (-Xms1000M).

Thank you

Robert
  • 39,162
  • 17
  • 99
  • 152
  • I even don't need it to be xlss file, cause I am only taking integers from it, so if i can just save large.xlsx to csv or txt and work with that, if someone could write this code with csv readig commands. –  Aug 08 '16 at 13:10
  • depends on what you will be calculating in the commented section you have there if you plan on doing something with several cells – Javant Aug 08 '16 at 13:30
  • Lets say I will be looking for a row, in which the first 5 cells have the largest average. So i need to read first 5 cells in a row, do the average, compare it to saved previous highest average and either save it or throw it away, next row. i don't need to modify xlsx, in the end i need 1 number, in this case the largest possible average of first5 cells in each row –  Aug 08 '16 at 13:39
  • If its the first 5 cell, then any imperative loop should do the job. Just make sure you properly initialize your data where you have the least amount of redundant calls. If that's to slow for you and you're really type of memory or time, you are probably better off switching to python or another language. – Javant Aug 08 '16 at 13:47
  • 1
    Where does the heap space error occur? On loading the file or while processing the data? – Robert Aug 08 '16 at 14:19
  • Can you use a excel jdbc driver? – farrellmr Aug 08 '16 at 14:20
  • Can you post a stacktrace? –  Aug 08 '16 at 14:21
  • 1 - My problem is the initialization, I get heap space error without any loops and math, simply by reading the file, even with 10mb xlsx file and i need it to be much larger. 2 - The initialization should be done in such way, that when i want to do average from all the cells in row r, starting collumn s ending collumn e. and do this 1mil times, it won't read that 10mb xlsxfile from scratch (or 200mb) –  Aug 08 '16 at 14:21
  • 1
    http://stackoverflow.com/questions/4897766/processing-large-xlsx-file-in-java ? –  Aug 08 '16 at 14:35
  • 1
    Use the [low memory streaming reader](http://poi.apache.org/spreadsheet/how-to.html#xssf_sax_api) approach? – Gagravarr Aug 08 '16 at 22:42

0 Answers0