1

Before posting I searched but not get the solution.

I have a larger excel file may be .xls/xlsx of > 10 MB. When I am reading small excel file then it reads ok. But when its large it says out of memory/heap. Some says to increase heap size but I think Its not a good solution. I am uploading excel file and reading as:

    byte b[] = file.getBytes();
    InputStream ips = new ByteArrayInputStream(b);
    Workbook workbook = WorkbookFactory.create(ips);
    Sheet sheet = workbook.getSheetAt(0);
    // ============
    int i = 0;
    List<String> colName = new ArrayList<>();
    Map<Integer, Object> lhm = null;

    Iterator<Row> rowIterator = sheet.iterator();
    while (rowIterator.hasNext())
    {
        lhm = new LinkedHashMap<>();
        Row row = rowIterator.next();
        // For each row, iterate through all the columns
        Iterator<Cell> cellIterator = row.cellIterator();

        while (cellIterator.hasNext())
        {
            Cell cell = cellIterator.next();
            // Check the cell type and format accordingly
            switch (cell.getCellType())
            {
            case Cell.CELL_TYPE_NUMERIC:
                // System.out.print(cell.getNumericCellValue() + "--");
                if (DateUtil.isCellDateFormatted(cell))
                {
                    lhm.put(cell.getColumnIndex(), Utils.getDateStringFromString(cell.getDateCellValue().toString(), "yyyy-MM-dd"));

                } else
                {
                    lhm.put(cell.getColumnIndex(), String.valueOf(cell.getNumericCellValue()));
                }
                break;
            case Cell.CELL_TYPE_STRING:
                if (i == 0)
                {
                    colName.add(cell.getStringCellValue());
                } else
                {
                    // System.out.print(cell.getStringCellValue() +
                    // "==");
                    lhm.put(cell.getColumnIndex(), cell.getStringCellValue());

                }
                break;
            case Cell.CELL_TYPE_BOOLEAN:
                // System.out.print(cell.getBooleanCellValue() + "--");
                lhm.put(cell.getColumnIndex(), String.valueOf(cell.getBooleanCellValue()));
                break;

            }

        }

This code not work for large excel file. What would be the solution for both xls/xlsx file. I am using apache POI API.

Joe
  • 479
  • 3
  • 8
  • 31
  • Can you work with an .XLSB? They are typically 25-35% of the size of an .XLSX. –  May 10 '16 at 07:11
  • @Jeeped: I am not familiar with .XLSB. So it will be helpful to provide the solution on that. Give me clue for that. – Joe May 10 '16 at 07:46
  • Have you looked at http://stackoverflow.com/questions/4897766/processing-large-xlsx-file-in-java – GdR May 10 '16 at 07:46
  • 1
    Try using FileInputStream instead of converting it into bytes. FileInputStream takes file as stream, so it will take it part by part. – padippist May 10 '16 at 08:04
  • 2
    You are loading everything into memory, don't do that. Store it on disk (preferably streaming) and afterwards proces it. Or directly proces it streaming instead of loading all into memory. – M. Deinum May 10 '16 at 08:13
  • @padippist,@M.Deinum: I am doing this: InputStream input=file.getInputStream(); POIFSFileSystem fs = new POIFSFileSystem( input ); HSSFWorkbook wb = new HSSFWorkbook(fs); HSSFSheet sheet = wb.getSheetAt(0); But same issue. – Joe May 10 '16 at 09:40
  • 1
    https://poi.apache.org/spreadsheet/quick-guide.html#FileInputStream – Axel Richter May 10 '16 at 10:14
  • Let me try it and get back to you – padippist May 10 '16 at 15:58
  • Also I am doing this way: // file is MultipartFile File convFile = new File( file.getOriginalFilename()); file.transferTo(convFile); InputStream ips = new BufferedInputStream( new FileInputStream(convFile)); Workbook workbook = WorkbookFactory.create(ips); Sheet sheet = workbook.getSheetAt(0); Is this the wright way? – Joe May 11 '16 at 05:06
  • `Workbook workbook = WorkbookFactory.create(convFile);` – Axel Richter May 12 '16 at 08:35
  • @AxelRichter: how it will help? elaborate a bit please. – Joe May 12 '16 at 09:14

1 Answers1

1

If the file can become really huge and may always exceed your available memory, you can take a look at the streaming-APIs in Apache POI, e.g. look at https://poi.apache.org/spreadsheet/how-to.html#event_api

It comes with an ready-to-run example.

For .xlsx/XSSF formatted files there is a similar way which provides the data in the Workbook in an even nicer way, see https://poi.apache.org/spreadsheet/how-to.html#xssf_sax_api

centic
  • 15,565
  • 9
  • 68
  • 125
  • Also I am doing this way: // file is MultipartFile File convFile = new File( file.getOriginalFilename()); file.transferTo(convFile); InputStream ips = new BufferedInputStream( new FileInputStream(convFile)); Workbook workbook = WorkbookFactory.create(ips); Sheet sheet = workbook.getSheetAt(0); Is this the wright way? – Joe May 11 '16 at 05:08
  • This way you still read the contents of the document fully into memory via POI. It might be sufficient if the documents don't become too large. Very large documents still might exceed the available memory, where only the Event-based approach will prevent memory shortage situations. – centic May 11 '16 at 07:10
  • with this event poi structuring the data in required format is getting tough. How to get it? I want to get the data in List> format. Map:index of column,corresponding value of each row. – Joe May 13 '16 at 07:07