2

I am trying to read a 30-60 MB of excel file in talend, but it is giving following error

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.lang.AbstractStringBuilder.<init>(Unknown Source)
at java.lang.StringBuffer.<init>(Unknown Source)
at org.apache.xmlbeans.impl.store.Locale$ScrubBuffer.<init>(Locale.java:1804)
at org.apache.xmlbeans.impl.store.Locale.getScrubBuffer(Locale.java:1904)
at org.apache.xmlbeans.impl.store.Xobj.getValueAsString(Xobj.java:1205)
at org.apache.xmlbeans.impl.store.Xobj.fetch_text(Xobj.java:1796)
at org.apache.xmlbeans.impl.values.XmlObjectBase.get_wscanon_text(XmlObjectBase.java:1332)
at org.apache.xmlbeans.impl.values.XmlObjectBase.check_dated(XmlObjectBase.java:1269)
at org.apache.xmlbeans.impl.values.JavaLongHolder.longValue(JavaLongHolder.java:53)
at org.apache.xmlbeans.impl.values.XmlObjectBase.getLongValue(XmlObjectBase.java:1502)
at org.openxmlformats.schemas.spreadsheetml.x2006.main.impl.CTRowImpl.getR(Unknown Source)
at org.apache.poi.xssf.usermodel.XSSFRow.getRowNum(XSSFRow.java:322)
at org.apache.poi.xssf.usermodel.XSSFSheet.initRows(XSSFSheet.java:229)
[statistics] disconnected
[statistics] disconnected
[statistics] disconnected
[statistics] disconnected
[statistics] disconnected
at org.apache.poi.xssf.usermodel.XSSFSheet.read(XSSFSheet.java:193)
at    org.apache.poi.xssf.usermodel.XSSFSheet.onDocumentRead(XSSFSheet.java:180)
at org.apache.poi.xssf.usermodel.XSSFWorkbook.onDocumentRead(XSSFWorkbook.java:300)
at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:159)
at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:221)
at de.cimt.talendcomp.tfileexcelpoi.SpreadsheetFile.initializeWorkbook(SpreadsheetFile.java:281)
at cdif_validation.validation_partsourcingtab3_0_1.Validation_PartSourcingTab3.tFileExcelWorkbookOpen_1Process(Validation_PartSourcingTab3.java:774)
at cdif_validation.validation_partsourcingtab3_0_1.Validation_PartSourcingTab3.tFileList_1Process(Validation_PartSourcingTab3.java:696)
at cdif_validation.validation_partsourcingtab3_0_1.Validation_PartSourcingTab3.runJobInTOS(Validation_PartSourcingTab3.java:8268)
at cdif_validation.validation_partsourcingtab3_0_1.Validation_PartSourcingTab3.runJob(Validation_PartSourcingTab3.java:8110)
at cdif_validation.cdif_validation_0_1.CDIF_Validation.tRunJob_3Process(CDIF_Validation.java:1175)
at cdif_validation.cdif_validation_0_1.CDIF_Validation.tForeach_1Process(CDIF_Validation.java:983)
at cdif_validation.cdif_validation_0_1.CDIF_Validation.tFileList_1Process(CDIF_Validation.java:820)
at cdif_validation.cdif_validation_0_1.CDIF_Validation.tJava_1Process(CDIF_Validation.java:634)
at cdif_validation.cdif_validation_0_1.CDIF_Validation.tPrejob_1Process(CDIF_Validation.java:529)
at cdif_validation.cdif_validation_0_1.CDIF_Validation.runJobInTOS(CDIF_Validation.java:1885)
at cdif_validation.cdif_validation_0_1.CDIF_Validation.main(CDIF_Validation.java:1722)

I have tried increasing xms upto 1 GB and xmx size upto 6 GB. also I tried using disk to store temp data. How to resolve this problem??

Sangeeta Rawat
  • 199
  • 1
  • 1
  • 15
  • 1
    At first glance, having 1GB of heap space might appear to be totally sufficient for a 60MB Excel file. However, I suspect that the in memory size of an Excel file could be substantially larger than its size on disk. – Tim Biegeleisen Oct 23 '15 at 07:36
  • Consider reviewing your code, looks like you have some kind of memory leak. – Dmitry Malinovsky Oct 23 '15 at 07:42

2 Answers2

1

Had same problem with Apache POI. Its because POI has large memory footprint. Now you can try to do some work-aroud like process your file by smaller parts or try to look at this and do some "optimization".

Also look Apache POI FAQ to point 14. to get some recommendations.

Community
  • 1
  • 1
edasssus
  • 331
  • 4
  • 15
0

The default talend components use way more memory than required. I believe it is because that 60MB xlsx file is entirely uncompressed in memory. Even if you're just reading 1 row from 1 tab.

To overcome on this limitation you can try to add more and more memory or look for a different way. I usually use the tFileExcel components from Jan Lolling. You can obtain these from the talend marketplace.

Balazs Gunics
  • 2,017
  • 2
  • 17
  • 24