0

I'm developing an app where I have to parse a huge XML file (65 MB) with the following structure, in order to generate a PDF file from it using Jasper Reports:

<A>
    <a attribute1="" attribute2="" attribute3=""/>
</A>
<B>
    <b attribute1="" attribute2="" attribute3=""/>
</B>

<C>
    <c attribute1="" attribute2="" attribute3=""/>
</C>
<D>
    <d attribute1="" attribute2="" attribute3=""/>
    <d attribute1="" attribute2="" attribute3=""/>
    <d attribute1="" attribute2="" attribute3=""/>
    <d attribute1="" attribute2="" attribute3=""/>
    <d attribute1="" attribute2="" attribute3=""/>
    <d attribute1="" attribute2="" attribute3=""/>
    <d attribute1="" attribute2="" attribute3=""/>
    <d attribute1="" attribute2="" attribute3=""/>
    <d attribute1="" attribute2="" attribute3=""/>
    <d attribute1="" attribute2="" attribute3=""/>
    <d attribute1="" attribute2="" attribute3=""/>
    <d attribute1="" attribute2="" attribute3=""/>
    ...
</D>

... with a very huge amount of <d> tags (min 500 000 tags).

My problem is that these tags are so huge that they are causing java.lang.OutOfMemoryError: Java heap space error.

I'm using this line to parse the file :

Document document = JRXmlUtils.parse(JRLoader.getLocationInputStream(xmlPath));

Does anyone have an alternative to using JRXmlUtils.parse method? To be able to avoid OutOfMemoryError error (without raising heap space) ?

Thank you

EDIT :

I've already seen this post concerning SAXParser but I don't know how to adapt it to my case since my XML structure is a little special (I have many data before my problematic tags)... Any clarification ?

Community
  • 1
  • 1
Sinda MOKADDEM
  • 796
  • 2
  • 12
  • 35
  • 1
    Possible duplicate of [Parsing large XML documents in JAVA](http://stackoverflow.com/questions/15132390/parsing-large-xml-documents-in-java) – rmlan Apr 18 '16 at 16:39
  • If `JRXmlUtils` can parse it, so can `SAXParser`. There doesn't seem to be anything "special" about the example doc you posted. Try some stuff out with `SAXParser`, and come back and ask a new question if you run into specific trouble. – rmlan Apr 18 '16 at 17:25
  • @rmlan OK, but as far as it's explained, parse method of SAXParser is a void method. However, in my case, I need to get the passed org.w3c.dom.Document to pass it to Jasper reports to be "transformed" into PDF.... – Sinda MOKADDEM Apr 18 '16 at 17:47
  • Using a different parser to get a `Document` will not help you much. Or are you looking for a way to remove the `` tags from your document before handing it to Jasper? – nyname00 Apr 18 '16 at 19:41
  • @rmlan No, Document is the parsed object of my XML file, this Document will be then used by Jasper to generate the corresponding PDF. So, I think that I have to find a way to fill my Document object during the parsing of XML (for example into endDocument method of the handler), possible ? – Sinda MOKADDEM Apr 19 '16 at 11:55
  • Use of a Document object suggests that you will have a representation of the entire XML document in memory at some point in time. Since this is what caused your OOM error, I don't see how you can ever avoid this. If Jasper only supports reading the entire document into memory before converting it to a PDF report, you may need to find a solution other than Jasper. – rmlan Apr 19 '16 at 12:53
  • @rmlan Unfortunately, using Jasper is a client requirement (it's a must). Are you saying that without changing Jasper, there is no way to avoid OOM error ? – Sinda MOKADDEM Apr 19 '16 at 13:54
  • I am not familiar with Jasper at all, but after a couple seconds of Google research, I found [this](http://community.jaspersoft.com/wiki/virtualizers-jasperreports), which you may be interested in. Good luck. – rmlan Apr 19 '16 at 14:18
  • @rmlan I already used Jasper virtualizer, but it is at another step of file generation, I wanted to implement some solution at parsing level. Anyway thank you. – Sinda MOKADDEM Apr 19 '16 at 17:04

0 Answers0