2

I'm currently working on a project that requires me to split an XML. For example here is a sample:

<Lakes>
  <Lake>
    <id>1</id>
    <Name>Caspian</Name>
    <Type>Natyral</Type>
  </Lake>
  <Lake>
    <id>2</id>
    <Name>Moreo</Name>
    <Type>Glacial</Type>
  </Lake>
  <Lake>
    <id>3</id>
    <Name>Sina</Name>
    <Type>Artificial</Type>
  </Lake>
</Lakes>

Now in my java code ideally what would happen is it will split the XML into 3 small ones for this example and send each of them out using a messenger service. The code for the messenger service is not important. I have that done already.

So for example the code would run, split the first part into this:

<Lakes>
  <Lake>
    <id>1</id>
    <Name>Caspian</Name>
    <Type>Natyral</Type>
  </Lake>
</Lakes>

and then the java code would send this out in a message. It would then move on to the next part, send that out etc etc until it reaches the end of the big XML. This can be done through an XSLT or through java it doesn't matter. Any ideas?

To make it clear, I pretty much know how to break up a file using XSLT but I don't know how to break it up and send each part individually one at a time. I also don't want to store anything locally so they would ideally all get transferred into strings and sent out.

Icebreaker
  • 277
  • 1
  • 6
  • 13
  • 1
    Does it have to be XSLT? Have you considered using a SAX parser and just read the document and send it out where you need to? – Miquel Jul 05 '12 at 20:42
  • 1
    possible duplicate of [Split 1GB Xml file using Java](http://stackoverflow.com/questions/5169978/split-1gb-xml-file-using-java) – bdoughan Jul 05 '12 at 20:50
  • Does not have to be XSLT but it most certainly has to be a java environment. Whatever has the capability to pause inbetween the parsing and send out the files in chunks is the way to go. I don't think this can be done in XSLT anyway. – Icebreaker Jul 05 '12 at 21:12
  • If you want, I cam post a solution, showing how you can do the split, create multiple documents and save them to a file each. – Dimitre Novatchev Jul 05 '12 at 22:36
  • That would be great although I don't need to save them but I guess the principle would be the same. Please do. – Icebreaker Jul 06 '12 at 00:17

2 Answers2

3

If the way you have to chunk your files is fixed and known, the easiest solution is to use SAX or StAX to do it programmatically. I personally prefer StAX for this kind of task as the code is generally cleaner and easier to understand but SAX will do the job equally well.

XSLT is a great tool but its main drawback is that it can only produce one output. And apart from a few exceptions XSLT engines don't support streaming processing, so if the initial file is too big to fit in memory, you can't use them.

Update: In XSLT 2.0 <xsl:result-document> can be used to produce multiple output files, but if you want to get your chunks one by one and not store them in files, it's not ideal.

biziclop
  • 48,926
  • 12
  • 77
  • 104
2

I would stream the XML (instead of building a DOM tree in memory) and cut the chunks out on the go. Whenever you meet a Lake tag, start copying the content into a buffer which you will send and reset when the final tag </Lake> is met.

EDIT Have a look at this link to know more about XML streaming in Java

GETah
  • 20,922
  • 7
  • 61
  • 103