How to avoid OutOfMemoryError in Java?

Question

I am new to Java. I have this 2 GB xml file which I need to parse and store its data into a database.

Someone on StackOverflow recommended me to use Dom4j for long xml files. Parsing is doing good, but returned Document (by Dom4j) is very long and on iteration loads all DOM objects into memory (heap).

This results into out-of-memory anomalies. Can somebody please help me how to avoid such errors? Do we have some phenomenon in Java for on-demand heap allocation and deposition in Java?

Is SAX or StAx an option for this? Do you need all data in memory? — Uwe Plonus, Jun 10 '13 at 09:55
Quickest solution: run your Java app with more memory (try using 4 GB). Mode detailed solutions: do not keep the whole XML in memory (since it won't fit), instead process it by chunks. — Luiggi Mendoza, Jun 10 '13 at 09:55

score 5 · Answer 1 · edited May 23 '17 at 12:22

5

You have two choices:

reconfigure your JVM to allocate more maximum memory (via -Xmx2g or similar). See here for more info. This option is obviously limited also by your OS and the amount of free memory in your system.
use a streaming API (such as SAX) that doesn't load all the XML into your memory at once, but rather streams it through your process, allowing you to analyse it without holding the entire doc in memory

The first option may help you immediately, and isn't specific to this question. The second option is the more scalable solution since it'll allow you to analyse documents of any size. Of course you need to worry about the memory consumption of the results of your analysis, but that's another matter entirely.

edited May 23 '17 at 12:22

Community

1
1

answered Jun 10 '13 at 09:56

Brian Agnew

268,207
37
334
440

Thanks Brian, increasing heap size is of-course known to me and processing XML in chunks is good suggestion. But I need some generic solution for avoiding too much data getting loaded in heap. Related problem was there for a large table too - with around 15000 records. In that too some said to use cursors. But these solutions seems to be contextual - is there any generic solution or guidelines for avoiding out-of-memory anamolies? Also Dom4j has a SAX parser. – user2139064 Jun 10 '13 at 11:40

score 1 · Answer 2 · answered Jun 10 '13 at 09:57

1

If you need to parse big XML files (and adding to the Java heap does not always work), you need a SAX parser which allows you to parse the XML stream instead of loading the whole DOM tree into memory.

You may also check SAXDOMIX

SAXDOMIX contains classes that can forward SAX events or DOM sub-trees to your application during the parsing of an XML document. The framework defines simple interfaces that allow the application to get DOM sub-trees in the middle of a SAX parsing. After handling, all DOM sub-trees become eligible for garbage collection. This solves the DOM scalability problem.

answered Jun 10 '13 at 09:57

Juned Ahsan

67,789
12
98
136

Thanks Juned, I am using Dom4j and I think they also have a SAX parser. As one of the code snippet says - SAXReader reader = new SAXReader(); – user2139064 Jun 10 '13 at 11:44
With DOM problem is that the entire xml tree need to be loaded in memory. No matter how big heap size u set and if ur tree does not fit in it, you will end up with Out of memory error. SAX is better for parsing big xml, as you can read in chunks. I like SAXDOMIX as it mixes sax and dom to allow u parse in chunks and with ease. Try that. – Juned Ahsan Jun 10 '13 at 11:47
DOM (as output) is being used intentionally as many of the xml nodes are inter-dependent and fully SAX is making the processing really slow. Doesn't SAX parser in Dom4j do the same job? – user2139064 Jun 10 '13 at 11:55

How to avoid OutOfMemoryError in Java?

2 Answers2