7

I am kind of new to Groovy and I am trying to read a (quite) large XML file (more than 1Gb) using XmlSlurper, which is supposed to work wonders with large files due to the fact that it doesn't build the whole DOM in memory.

Nevertheless I keep getting "OutOfMemoryError : Java heap space" which makes me think that there obviously is something that I'm doing wrong. I tried increasing the Xmx setting but I would rather solve the problem since I may have to deal with even bigger files afterwards.

Here is the line of code I used:

def posts = new XmlSlurper().parse(new File("posts.xml"))

Any hint on what's wrong ?

Thanks in advance,

Jérémie.

Jérémie Clos
  • 11
  • 2
  • 10
  • This question is similar: http://stackoverflow.com/questions/4104264/is-it-possible-to-parse-sub-trees-with-groovy-xmlslurper – Lari Hotari Feb 11 '16 at 13:24

2 Answers2

8

Groovy's XmlSlurper is a SAX parser, but loads the entire model into memory...

To avoid OOM exceptions, you probably need to either up your memory allowance (as you say, using the -Xmx setting), or you can write your own SAX parser to get just the data you require from the document

tim_yates
  • 167,322
  • 27
  • 342
  • 338
4

I'm a bit late to this party, but I've been having the same issue also.

I made a proposition to the groovy-user mailing list, actually proposing to add something that looks like the XML::Twig perl module to XmlSlurper.

def xpathSlurper = new XPathXmlSlurper2();    
def c = { twig, it ->      
    println it.text().trim();
    twig.purgeCurrent();
}
xpathSlurper.setTwigRootHandler(xpath, c);
def fdata = xpathSlurper.parse(new File("test.xml")); 

I've attached the sample code here: http://groovy.329449.n5.nabble.com/first-step-toward-Xml-Twig-for-Groovy-groovy-util-XPathXmlSlurper2-groovy-td4923577.html

I hope this helps!

  • Right now I solved my problem by writing my own SAX Parser as tim_yates suggested but since I am bound to deal with similar (and probably bigger) quantities of data in the future I'd be glad to have something like that. Thanks for pointing it out ! – Jérémie Clos Apr 12 '12 at 13:30