2

My application creates a very big xml file (of about 300K transactions). Each of the transaction will have about 20 xml elements. So it creates a huge xml file. We did not use JAXB or SAX or DOM for creation of xml file as memory is the constraint. Now i have a need to replace certain tag values in xml file once it is created. I know what is to be replaced and the value to replace with. How can i replace those variables without loading entire file into memory? For 300K transactions, the file size is coming for about 600 MB. So we do not want to load entire file into memory for replacing few variables.

We are using Java5. Is there a way we can do it?

3 Answers3

2

You can try VTD-XML:

  • Memory-efficient (1.3x~1.5x the size of an XML document) random-access XML parser.
  • Fastest XML parser: On a Core2 2.5Ghz Desktop, VTD-XML outperforms DOM parsers by 5x~12x, delivering 150~250 MB/sec per core sustained throughput.
  • Incremental-update capable XML parser capable of cutting, pasting, splitting and assembling XML documents with max efficiency.
  • Available in C, C++, C# and Java.

Example modifying XML.

vzamanillo
  • 9,905
  • 1
  • 36
  • 56
1

Everything I've ever read on this topic indicates that you can't do this without loading the file into memory or streaming it to another file. That's probably what you'll end up needing to do - stream your source into a new file, modifying as you go.

More info about that process - http://docs.oracle.com/javaee/5/tutorial/doc/bnbfl.html#bnbgq

I like the way Stephen C addresses your problem in an answer here - How to modify a huge XML file by StAX?

Community
  • 1
  • 1
ThisClark
  • 14,352
  • 10
  • 69
  • 100
1

You could try a streaming transformation using XSLT 3.0 (specifically, Saxon-EE).

I'm not sure what you mean by "tag values" (it's so much easier if people use the correct terminology...) but if you mean the values of text nodes, then you could write a streaming transformation something like this:

<xsl:mode streamable="yes" on-no-match="shallow-copy"/>

<xsl:template match="xyz/text()[.='old value']">
  <xsl:text>new value</xsl:text>
</xsl:template>

with further rules for additional substitutions. You can also, of course, have rules that rename or delete selected elements, etc.

Michael Kay
  • 156,231
  • 11
  • 92
  • 164
  • Thanks michael..by tag value i meant value of a text node...thats a new thing you suggested..will give a try..but the new value is going to vary based on some runtime business logic ...which i cannot put it in xsl..then how do i do that? – Bharat Kondapalli May 01 '15 at 07:47
  • It's always possible to call out from XSLT to Java code - though it's not necessarily nearly as often as people imagine, because the logic can usually be written in XSLT just as easily. – Michael Kay May 03 '15 at 15:26