0

I am trying to read data from an excel file. The excel file is without any macros, it is just a normal excel file.The size of the file is approximately 7mb, and the file has 5 sheets. I have to read each sheet and insert data into the database. My requirement is to read the excel file, no writing back to the same excel file or modifying the excel file. I have tried different ways of reading the file but get java.lang.OutOfMemoryError: GC overhead limit exceeded. And this code is executed in tomcat server and not a standalone application

My java version is 8

Apache-Poi version is 3.14

Apache tomcat version is 9

And the heap space in tomcat is -Xmx1024m

Note : I have no option to increase the heap space

Following is the code that i have tried

I am reading the entire file once and creating a Workbook and passing this workbook to different methods to read the sheets needed in each method.

public Workbook getWorkBook (String path, String fileName) {
    log.info(dateFormat.format(new java.util.Date()) + " -- start getWorkBook");
    String reportTypeExcelPath = path + fileName;;
    try {
        File file = new File(reportTypeExcelPath);
        if (file.exists()) {
            OPCPackage pkg;
            try {
                //excelFile = new BufferedInputStream(new FileInputStream(file));
                pkg = OPCPackage.open(file.getAbsolutePath());
                log.info("Total Memory " + Runtime.getRuntime().totalMemory());
                log.info("Free Memory " + Runtime.getRuntime().freeMemory());
                Instant start = java.time.Instant.now();
                workbook =  WorkbookFactory.create(pkg);
                Instant end = java.time.Instant.now();
                Duration between = java.time.Duration.between(start, end);
                log.info("between in checkMandatorySheets " + between );
                pkg.close();
                log.info(dateFormat.format(new java.util.Date()) + " -- end getWorkBook");
                return workbook;
            } catch (InvalidFormatException e) {
                System.out.println("Exception in checkMandatorySheets() :: \n" +  e);
                return null ;
            }
            
        } else {
            System.out.println("No file found " + reportTypeExcelPath);
            return null ;
        }
    } catch (IOException e) {
        System.out.println("Exception in checkMandatorySheets() :: \n" +  e);
        return null ;
    }
}

And this is the error

java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.apache.xmlbeans.impl.store.Cur$CurLoadContext.attr(Cur.java:3039)
at org.apache.xmlbeans.impl.store.Cur$CurLoadContext.attr(Cur.java:3060)
at org.apache.xmlbeans.impl.store.Locale$SaxHandler.startElement(Locale.java:3250)
at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.reportStartTag(Piccolo.java:1082)
at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseAttributesNS(PiccoloLexer.java:1802)
at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseOpenTagNS(PiccoloLexer.java:1521)
at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseTagNS(PiccoloLexer.java:1362)
at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseXMLNS(PiccoloLexer.java:1293)
at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseXML(PiccoloLexer.java:1261)
at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.yylex(PiccoloLexer.java:4808)
at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.yylex(Piccolo.java:1290)
at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.yyparse(Piccolo.java:1400)
at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.parse(Piccolo.java:714)
at org.apache.xmlbeans.impl.store.Locale$SaxLoader.load(Locale.java:3439)
at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1270)
at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1257)
at org.apache.xmlbeans.impl.schema.SchemaTypeLoaderBase.parse(SchemaTypeLoaderBase.java:345)
at org.openxmlformats.schemas.spreadsheetml.x2006.main.WorksheetDocument$Factory.parse(Unknown Source)
at org.apache.poi.xssf.usermodel.XSSFSheet.read(XSSFSheet.java:153)
at org.apache.poi.xssf.usermodel.XSSFSheet.onDocumentRead(XSSFSheet.java:145)
at org.apache.poi.xssf.usermodel.XSSFWorkbook.onDocumentRead(XSSFWorkbook.java:201)
at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:200)
at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:151)
at org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:45)

I have also checked for the memory, and these are the results

Total Memory 549453824

Free Memory 116004584

which is approximately 110.630592346 MB and my file size is 7 MB

The GC overhead limit exceeded is triggered at this line of code

workbook = WorkbookFactory.create(pkg);

Here are the different ways tried to read the excel file

public static void method1() {
    String reportTypeExcelPath = "D:\\File Testing\\Testing File.xlsx";
    try {
        File file = new File(reportTypeExcelPath);
        if (file.exists()) {
                FileInputStream excelFile = new FileInputStream(reportTypeExcelPath);
                Instant start = java.time.Instant.now();
                XSSFWorkbook workbook = new XSSFWorkbook(excelFile);
                Instant end = java.time.Instant.now();
                Duration between = java.time.Duration.between(start, end);
                System.out.println("Method1");
                System.out.println( between ); // PT1.001S
                System.out.format("%dD, %02d:%02d:%02d.%04d \n", between.toDays(),
                        between.toHours(), between.toMinutes(), between.getSeconds(), between.toMillis());
                
                
        } else {
            System.out.println("No file found " + reportTypeExcelPath);
            return ;
        }
    } catch (IOException e) {
        System.out.println("Exception in checkMandatorySheets() :: \n" +  e);
        return ;
    }
} 

public static void method2() {
    String reportTypeExcelPath = "D:\\File Testing\\Testing File.xlsx";
    try {
        File file = new File(reportTypeExcelPath);
        if (file.exists()) {
            OPCPackage pkg;
            try {
                pkg = OPCPackage.open(file);
                Instant start = java.time.Instant.now();
                XSSFWorkbook workbook = new XSSFWorkbook(pkg);
                Instant end = java.time.Instant.now();
                Duration between = java.time.Duration.between(start, end);
                System.out.println("Method2");
                System.out.println( between ); // PT1.001S
                System.out.format("%dD, %02d:%02d:%02d.%04d \n", between.toDays(),
                        between.toHours(), between.toMinutes(), between.getSeconds(), between.toMillis());
                pkg.close();
            } catch (InvalidFormatException e) {
                System.out.println("Exception in checkMandatorySheets() :: \n" +  e);
                return ;
            }
            
        } else {
            System.out.println("No file found " + reportTypeExcelPath);
            return ;
        }
    } catch (IOException e) {
        System.out.println("Exception in checkMandatorySheets() :: \n" +  e);
        return ;
    }
}

public static void method3() {
    String reportTypeExcelPath = "D:\\File Testing\\Testing File.xlsx";
    try {
        File file = new File(reportTypeExcelPath);
        if (file.exists()) {
            OPCPackage pkg;
            try {
                FileInputStream excelFile = new FileInputStream(reportTypeExcelPath);
                pkg = OPCPackage.open(excelFile);
                Instant start = java.time.Instant.now();
                XSSFWorkbook workbook = new XSSFWorkbook(pkg);
                Instant end = java.time.Instant.now();
                Duration between = java.time.Duration.between(start, end);
                System.out.println("Method3");
                System.out.println( between ); // PT1.001S
                System.out.format("%dD, %02d:%02d:%02d.%04d \n", between.toDays(),
                        between.toHours(), between.toMinutes(), between.getSeconds(), between.toMillis());
                pkg.close();
            } catch (InvalidFormatException e) {
                System.out.println("Exception in checkMandatorySheets() :: \n" +  e);
                return ;
            }
            
        } else {
            System.out.println("No file found " + reportTypeExcelPath);
            return ;
        }
    } catch (IOException e) {
        System.out.println("Exception in checkMandatorySheets() :: \n" +  e);
        return ;
    }
}

public static void method4() {
    String reportTypeExcelPath = "D:\\File Testing\\Testing File.xlsx";
    try {
        File file = new File(reportTypeExcelPath);
        if (file.exists()) {
            OPCPackage pkg;
            try {
                pkg = OPCPackage.open(file);
                Instant start = java.time.Instant.now();
                Workbook wb = WorkbookFactory.create(pkg);
                Instant end = java.time.Instant.now();
                Duration between = java.time.Duration.between(start, end);
                System.out.println("Method4");
                System.out.println( between ); // PT1.001S
                System.out.format("%dD, %02d:%02d:%02d.%04d \n", between.toDays(),
                        between.toHours(), between.toMinutes(), between.getSeconds(), between.toMillis());
                pkg.close();
            } catch (InvalidFormatException e) {
                System.out.println("Exception in checkMandatorySheets() :: \n" +  e);
                return ;
            }
            
        } else {
            System.out.println("No file found " + reportTypeExcelPath);
            return ;
        }
    } catch (IOException e) {
        System.out.println("Exception in checkMandatorySheets() :: \n" +  e);
        return ;
    }
}

public static void method5() {
    String reportTypeExcelPath = "D:\\File Testing\\Testing File.xlsx";
    try {
        File file = new File(reportTypeExcelPath);
        if (file.exists()) {
            OPCPackage pkg;
            try {
                FileInputStream excelFile = new FileInputStream(reportTypeExcelPath);
                pkg = OPCPackage.open(excelFile);
                Instant start = java.time.Instant.now();
                Workbook wb = WorkbookFactory.create(pkg);
                Instant end = java.time.Instant.now();
                Duration between = java.time.Duration.between(start, end);
                System.out.println("Method5");
                System.out.println( between ); // PT1.001S
                System.out.format("%dD, %02d:%02d:%02d.%04d \n", between.toDays(),
                        between.toHours(), between.toMinutes(), between.getSeconds(), between.toMillis());
                pkg.close();
            } catch (InvalidFormatException e) {
                System.out.println("Exception in checkMandatorySheets() :: \n" +  e);
                return ;
            }
            
        } else {
            System.out.println("No file found " + reportTypeExcelPath);
            return ;
        }
    } catch (IOException e) {
        System.out.println("Exception in checkMandatorySheets() :: \n" +  e);
        return ;
    }
}

public static void method6() {
    String reportTypeExcelPath = "D:\\File Testing\\Testing File.xlsx";
    try {
        File file = new File(reportTypeExcelPath);
        if (file.exists()) {
            try {
                FileInputStream excelFile = new FileInputStream(reportTypeExcelPath);
                Instant start = java.time.Instant.now();
                Workbook wb = WorkbookFactory.create(excelFile);
                Instant end = java.time.Instant.now();
                Duration between = java.time.Duration.between(start, end);
                System.out.println("Method6");
                System.out.println( between ); // PT1.001S
                System.out.format("%dD, %02d:%02d:%02d.%04d \n", between.toDays(),
                        between.toHours(), between.toMinutes(), between.getSeconds(), between.toMillis());
            } catch (InvalidFormatException e) {
                System.out.println("Exception in checkMandatorySheets() :: \n" +  e);
                return ;
            }
            
        } else {
            System.out.println("No file found " + reportTypeExcelPath);
            return ;
        }
    } catch (IOException e) {
        System.out.println("Exception in checkMandatorySheets() :: \n" +  e);
        return ;
    }
}

public static void method7() {
    String reportTypeExcelPath = "D:\\File Testing\\Testing File.xlsx";
    try {
        File file = new File(reportTypeExcelPath);
        if (file.exists()) {
            try {
                Instant start = java.time.Instant.now();
                Workbook wb = WorkbookFactory.create(file);
                Instant end = java.time.Instant.now();
                Duration between = java.time.Duration.between(start, end);
                System.out.println("Method7");
                System.out.println( between ); // PT1.001S
                System.out.format("%dD, %02d:%02d:%02d.%04d \n", between.toDays(),
                        between.toHours(), between.toMinutes(), between.getSeconds(), between.toMillis());
            } catch (InvalidFormatException e) {
                System.out.println("Exception in checkMandatorySheets() :: \n" +  e);
                return ;
            }
            
        } else {
            System.out.println("No file found " + reportTypeExcelPath);
            return ;
        }
    } catch (IOException e) {
        System.out.println("Exception in checkMandatorySheets() :: \n" +  e);
        return ;
    }
}

And the GC error occurs at these points

XSSFWorkbook workbook = new XSSFWorkbook(excelFile);

XSSFWorkbook workbook = new XSSFWorkbook(pkg);

Workbook wb = WorkbookFactory.create(pkg);

Workbook wb = WorkbookFactory.create(file);

I have gone through multiple links in stack overflow but nothing has helped.

Any suggestions or alternatives to get the file loaded.

Avinash Reddy
  • 2,204
  • 3
  • 25
  • 44
  • 3
    That’s a known issue. It only applies to XLSX, not to XLS. You may try [the streaming API](https://poi.apache.org/apidocs/dev/org/apache/poi/xssf/streaming/package-summary.html) instead, if your task permits it. See also https://poi.apache.org/components/spreadsheet/how-to.html#sxssf – Holger Sep 23 '22 at 06:31
  • @holger I have read in so many links that SXSSF is for writing to an excel sheet and not for reading from an excel file. Here is the link As stated in other relevant stackoverflow question stackoverflow.com/questions/12513981/… , SXSSFWorkBook is write only, it doesn't support reading which explains why you get null. – Avinash Reddy Sep 23 '22 at 07:06
  • @Holger https://stackoverflow.com/questions/33786219/apache-poi-streaming-sxssf-for-reading – Avinash Reddy Sep 23 '22 at 07:20
  • 1
    Seems you already found the solution… – Holger Sep 23 '22 at 07:30
  • @Holger Which one is the solution . This one ? StreamingReader reader = StreamingReader.builder() .rowCacheSize(100) // number of rows to keep in memory (defaults to 10) .bufferSize(4096) // buffer size to use when reading InputStream to file (defaults to 1024) .sheetIndex(0) // index of sheet to use (defaults to 0) .sheetName("sheet1") // name of sheet to use (overrides sheetIndex) .read(is); – Avinash Reddy Sep 23 '22 at 07:36
  • @Holger It does not work for XLS file as well, i changed my current file extension to .xls and tried. Getting the same java.lang.OutOfMemoryError: GC overhead limit exceeded. Also SXSSF does not work for reading files as already mentioned. – Avinash Reddy Sep 23 '22 at 09:49
  • Please [edit] your question to include a [mre] that reflects you new findings and includes your implementation of the SAX Event API handlers referenced in the [duplicate](https://stackoverflow.com/a/33786506/230513) you cited. – trashgod Sep 23 '22 at 12:39
  • 1
    Well, nobody would expect that you can fix such a problem by renaming the file. – Holger Sep 23 '22 at 14:07
  • @holger will create a new xls file put the data into that file and will check – Avinash Reddy Sep 23 '22 at 15:39
  • xls and xlsx are completely different data formats - https://github.com/pjfanning/excel-streaming-reader works well for xlsx reading (a newer version of the StreamingReader you mention above) – PJ Fanning Sep 24 '22 at 15:28

0 Answers0