2

How can i split an long XML-file into pieces with each a predefined different name?

Example this is my XML file pasted in one long XML, generated for testing. Now i have to split on envelope, each a new file.

<envelope>
 <tag1>1</tag1>
 <tag2>2</tag2>
 <tag3>3</tag3>
</envelope>
<envelope>
 <tag1>1</tag1>
 <tag2>2</tag2>
 <tag3>3</tag3>
</envelope>
<envelope>
 <tag1>1</tag1>
 <tag2>2</tag2>
 <tag3>3</tag3>
</envelope>

I have already work with splits before just not like this where there is no begin and end tag for the entire xml.

Eve
  • 514
  • 3
  • 12
  • 23

3 Answers3

4

I suggest making it well formed and then using one of the SAX or StAX solutions as suggested. The only difference is that I would avoid loading the whole thing into memory and instead inject the start and end elements by way of a SequenceInputStream.

for example:

InputStream in = new SequenceInputStream(
                        // start doc
                        new ByteArrayInputStream("<root>".getBytes()),
                        new SequenceInputStream(
                           new FileInputStream("envelopes.txt"),
                           // end doc
                           new ByteArrayInputStream("</root>".getBytes())));
massfords
  • 689
  • 7
  • 12
2

As Joachim said this is not an XML.

You can try to add a root element programmaticly, save the file as a temp somewhere and then refer to the other similar question on how to split it.


Answering the comment:

This might help you load it. I doubt you should worry about the size, since to split it you'd have to load it in memory anyway and then write it again.

Then something like:

final String xmlWithRootElement = "<root>" + IOUtils.toString(yourFile) + "</root>";

should do it. (without so many hardcoded strings)

One last thing.

I would suggest finding a solution that works. Then if you're unhappy with the performance you can look for ways to optimize it or you can ask a performance related question.

Community
  • 1
  • 1
Simeon
  • 7,582
  • 15
  • 64
  • 101
  • I would like to do that but the xml file is alot bigger it has 1000 "envelope" with also contains 50 lines. so adding it would be a bit too much – Eve Jun 10 '11 at 12:21
  • 1000 envelope elements is not a lot, is actually quite few IMO. If you had a 1000000 envelope elements than you might notice it. How big is the file ? – Simeon Jun 10 '11 at 12:45
0

How about just read the file character by character and identify <envelope> and </envelope> sequences. Whenever you encounter <envelope> you start capturing to a buffer until you reach </envelope>. This way the file can be as big as the filesystem allows. XML manipulation on large files is a headache :-)

Karl-Bjørnar Øie
  • 5,554
  • 1
  • 24
  • 30