1

I am attempting to open a fairly large excel document (about 150,000 rows of 6 columns) using Apache POI in Java to remove an entire worksheet. When I try

wb = new XSSFWorkbook("OUTPUT/BOSS.xlsx");

I get

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

I am really not sure what else to do. Unfortunately the server I am trying to do this on is running 1.5 so I am limited on what I can use/do. Also apparently SXSSF only works for writing, not reading, files.

Does anyone have some sort of suggestion as to what I might be able to use/try?

EDIT: The excel file is about 2.5mb if that matters

xCasper
  • 123
  • 10
  • HOw is your program running? Console? Servlet? – Jorge Campos Jul 21 '15 at 02:07
  • Can you try after changing max memory e.g. -Xmx1g ? – TheCodingFrog Jul 21 '15 at 02:07
  • See it here if you don't know what @my-thoughts is saying http://stackoverflow.com/a/1566026/460557 and if it helps you don't forget to upvote the answerer! – Jorge Campos Jul 21 '15 at 02:10
  • See it here if you don't know what @my-thoughts is saying http://stackoverflow.com/a/1566026/460557 – Jorge Campos Jul 21 '15 at 02:11
  • The program is running through console. No servlet. The memory is set to a gig and it is still hitting the heap. Also this is one of the smaller files I will be having to mess with so I really need some way to edit an excel document (even if its not POI -- as long as its 1.5 jre compatible). I figure there has to be a way, I just cannot seem to find a way after hours of googleing. ;( – xCasper Jul 21 '15 at 02:16
  • 1
    I would then recommend you to set a proper configuration not only the heap size. I would recommend you to try this configuration: `-Xms256m -Xmx4096m -XX:+UseParallelGC` – Jorge Campos Jul 21 '15 at 02:22
  • Jorge, unfortunatly there is no way I can justify using 4gigs of memory since this is being deployed on a VM with limited resources. Also UseParallelGC doesnt seem to make a difference. My guess is that because I am opening the file all at once, there is nothing to gc. My max heap is 1024m. – xCasper Jul 21 '15 at 02:26
  • Well, I'm out of options, so it leaves me with this last one: http://stackoverflow.com/a/6544301/460557 the main idea is to split the file or convert it into another less heavy format – Jorge Campos Jul 21 '15 at 02:36
  • So based on that I was able to save the xlsx file as a .zip instead. Opening the zip and removing the sheets I need to remove is pretty easy. The question then becomes how do I now convert from xml back to .xlsx lol. I am going to do more research on this and post back if I find a way. If anyone has an answer please feel free to post. – xCasper Jul 21 '15 at 02:40
  • I am still working on this. Unfortunately I have yet to find a real solution to open a large excel without crashing my heap, or using more then 1gb. – xCasper Jul 21 '15 at 15:27
  • Do you get the exception on load or later? I just tested and was able to open a 170000x9 sheet with 1gb of heap. And if you have more than one sheets, are they all about the same size? – Axel Jul 30 '15 at 11:23
  • Were you able to make changes and then save the document with that? I managed to get it open but my bigger problem is I am restricted to 256M if I want to deploy this on the mainframe (which is absolutely the dumbest thing ever lol)... I told them its impossible with that restriction. – xCasper Jul 30 '15 at 17:15
  • @JonathanDrapeau I will check out that post thank you. – xCasper Jul 30 '15 at 17:16

1 Answers1

0

It seems the best way to parse the data from the excel document is to use POI SAX Parser. You can find more information on it here: http://poi.apache.org/spreadsheet/how-to.html#xssf_sax_api

The problem I have now is how to use this to then delete the worksheet that is being parsed. I will make a new question for this if I cannot figure it out.

xCasper
  • 123
  • 10