1

I have a scenario in my project where user uploads XLS file with 1Million record and I need to conver this xls file to csv file and then sql server job will process the csv file.

I have a process to convert xls file to csv - but when I deploy this code in PCF it starts throwing heap memory erorr. I am also getting the same error in my local environment.

Code Snapshot:

String inputFileName = "UserInput.xls";

FileInputStream input_document = new FileInputStream(new File(inputFileName));
Workbook my_xls_workbook = StreamingReader.builder().open(input_document);//this line throws out of memory error.
Sheet sheet = my_xls_workbook.getSheetAt(0);

...... remaing code reads the sheet object and perform conversion.

trincot
  • 317,000
  • 35
  • 244
  • 286
Sachin s
  • 13
  • 3
  • 1
    StreamingReader is not in the Java standard lib... please provide the full class name (or the imports) required to understand your code. – Renato Nov 22 '19 at 20:43
  • Try using `open(File)` instead of `open(InputStream)` – Andreas Nov 22 '19 at 20:43
  • @Andreas that won't help, it's the same code running... see the source : https://github.com/monitorjbl/excel-streaming-reader/blob/master/src/main/java/com/monitorjbl/xlsx/StreamingReader.java – Renato Nov 22 '19 at 20:46
  • Ok then, try using POI's [`SXSSF`](https://poi.apache.org/components/spreadsheet/index.html) directly, for a good ***streaming*** read of the data. – Andreas Nov 22 '19 at 20:48
  • Yep... this library OP is using seems to load the whole thing into memory, so the name is terrible as it implies you can do streaming parsing (which wouldn't load anything into memory other than the next element) but you really can't from what I can see in the source. – Renato Nov 22 '19 at 20:50

3 Answers3

2

The method you're using to load the file into a Workbook seems to be eager , i.e. it will just read the whole document into memory, parse it, and then return the result if it didn't run out of memory first.

However, on the README page of the project, they show what you should be doing instead to avoid that:

import com.monitorjbl.xlsx.StreamingReader;

InputStream is = new FileInputStream(new File("/path/to/workbook.xlsx"));
Workbook workbook = StreamingReader.builder()
    .rowCacheSize(100)    // number of rows to keep in memory (defaults to 10)
    .bufferSize(4096)     // buffer size to use when reading InputStream to file (defaults to 1024)
    .open(is);            // InputStream or File for XLSX file (required)

After this, iterate over the workbooks, but do not use the get(0) method as that seems to require loading all workbooks into memory at once.

for (Sheet sheet : workbook){
    System.out.println(sheet.getSheetName());
    for (Row r : sheet) {
        for (Cell c : r) {
            System.out.println(c.getStringCellValue());
        }
    }
}

Have you tried this? If you did, you should file a bug because clearly it shouldn't use up all memory available as that's the whole point of the library existing.

Renato
  • 12,940
  • 3
  • 54
  • 85
  • Hello Renato - I just tried your code and it worked for me. My issue is resolved. You are right - I was loading the entire file in the memory. Using cache size I was able to resolve this issue. – Sachin s Nov 22 '19 at 21:07
  • @Bharat that code was posted at the exact same time as me and it's from the README. Also, the user is a new user and I gave more information about why things are going bad for him. – Renato Nov 22 '19 at 21:09
  • I explained how the way he uses the library can trigger it to load data into memory at once, something he doesn't seem to be aware of. Also, the other answer is not enough: you must iterate via iterators as shown above. Using get(0) as the user was doing was still going to load all data in memory. Please be kind and remove the downvote. – Renato Nov 22 '19 at 21:15
0

It is always better to attach to the question a bit of context (e.g. how much memory you have, how much memory you give to the JVM, the error you get)

Having said that, my shot in the dark is that you should give more memory to JVM, something like

java .... -Xms4g -Xmx4g ...

See this question [1] for more details about those parameters.


[1] What are the -Xms and -Xmx parameters when starting JVM?

Andrea
  • 2,714
  • 3
  • 27
  • 38
0

use easypoi do

try {
        FileOutputStream fos    = new FileOutputStream("D:/home/excel/ExcelToCsv.test.csv");
        ImportParams     params = new ImportParams();
        params.setTitleRows(1);
        CsvExportParams csvExportParams = new CsvExportParams();
        IWriter ce = CsvExportUtil.exportCsv(csvExportParams, MsgClient.class, fos);
        ExcelImportUtil.importExcelBySax(
                new FileInputStream(
                        new File(FileUtilTest.getWebRootPath("import/BigDataExport.xlsx"))),
                MsgClient.class, params, new IReadHandler<MsgClient>() {

                    private List<MsgClient> list = new ArrayList<>();

                    @Override
                    public void handler(MsgClient o) {
                        list.add(o);
                        if (list.size() == 10000) {
                            ce.write(list);
                            list.clear();
                        }
                    }

                    @Override
                    public void doAfterAll() {
                        System.out.println("succcess--------------------------------");
                    }
                });
    } catch (Exception e) {

    }

address https://github.com/jueyue/easypoi

JueYue
  • 1