2

I've read some articles on advantages of using SAX parser for parsing XML files in java over using DOM. The one which appeals me the most (as discussed here) is that

Sax is suitable for large XML File and The SAX parser does not loads the XML file as a whole in the memory.

But now as i've written a parser using SAX to derive the entities out of an XML file for a large file of almost 1.4 GB it generates the following Exception.

org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; The parser has encountered more than "64,000" entity expansions in this document; this is the limit imposed by the application.

What is the problem with the memory if the file as whole is not loaded in the memory.

How can i resolve the issue?

Community
  • 1
  • 1
Subhan
  • 1,544
  • 3
  • 25
  • 58
  • 3
    That is not a necessarily an actual memory limitation, but a protective measure against DOS-attacks like [this one](https://en.wikipedia.org/wiki/Billion_laughs). If your input XML legally contains that many entities you can increase that limit in your parser. Look at its documentation. – Tomalak Apr 02 '15 at 19:25
  • what to you suggest me to do with this protective measure? – Subhan Apr 02 '15 at 19:31
  • I thought I said that. – Tomalak Apr 02 '15 at 19:38
  • Should i look at the documentation of JVM? – Subhan Apr 02 '15 at 19:40

2 Answers2

3

Change the entity expansion limit with a JVM parameter:

-DentityExpansionLimit=1000000
Necreaux
  • 9,451
  • 7
  • 26
  • 43
  • Depends on how you are running your program. It's a command-line parameter. – Necreaux Apr 02 '15 at 19:32
  • this post http://stackoverflow.com/questions/29360901/getting-parent-child-hierarchy-in-sax-xml-parser contains my code for the parser hope you understand how i'm dealing with it – Subhan Apr 02 '15 at 19:34
  • Yes, but how are you RUNNING it. Are you typing java blah blah from the command prompt? Are you executing it via an IDE? – Necreaux Apr 02 '15 at 19:42
  • Under run configurations on the arguments tab it's called "VM arguments". That's where you want to add that. – Necreaux Apr 02 '15 at 19:44
  • Thank you so much that really worked. :) I'm really really great full to you. – Subhan Apr 02 '15 at 19:56
0

You can also think about using StAX.

SAX is event driven and serial. It can handle large XML, but takes a lot of CPU resources.

DOM is taking the complete document in memory.

StAX is a more recent API. It is streaming over the XML. It can be seen as a cursor or iterator over the document. It has the advantage you can skip elements that you don't need (attributes, tags, ...). It is taking a lot less CPU resources if used properly.

https://docs.oracle.com/javase/tutorial/jaxp/stax/why.html

With SAX, the XML push the events.

With StAX, you pull the XML to you.

Stefaan Neyts
  • 2,054
  • 1
  • 16
  • 25
  • Does this means all my efforts to create a parser (using SAX) that actually worked well for the files of smaller size is wasted? – Subhan Apr 02 '15 at 19:36
  • No. You can stick to SAX if you have fixed your issue. I just wanted to inform you there is still another, modern way of parsing XML. Another advantage: with SAX u can only parse XML, with StAX u can also write XML. – Stefaan Neyts Apr 02 '15 at 19:41
  • And if you have written your SAX implementation with well chosen methods, maybe you can reuse a lot of code and try the StAX way to measure the difference in performance. U will be surprised, believe me: when used correctly and skipping unnecessary elements, your parse time will decrease drastically! – Stefaan Neyts Apr 02 '15 at 19:43
  • in the comment to an answer below, i have added a link for my code. You see that. – Subhan Apr 02 '15 at 19:45
  • It's just a proposal! I can provide you a StAX snippet if you want. It is typically used in a certain pattern. I'll look it up and will edit my post with a small example. – Stefaan Neyts Apr 02 '15 at 19:46
  • That would be so nice of you. Please review my code and show me how stax can do it :) and then may be i migrate to stax. – Subhan Apr 02 '15 at 19:49
  • Okay. Coming back to you later. – Stefaan Neyts Apr 02 '15 at 19:51
  • So you want a pretty print of your tags only? – Stefaan Neyts Apr 02 '15 at 19:52
  • My Overall goal is to turn my XML file into sql but first i'm working on getting entities so that i can create ER Diagram out of it. – Subhan Apr 02 '15 at 19:56
  • Answer posted in your other question! – Stefaan Neyts Apr 02 '15 at 20:52