I have large XML files ("ONIX" standard) I'd like to split. Basic structure is:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE ONIXmessage SYSTEM "http://www.editeur.org/onix/2.1/short/onix-international.dtd">
<!-- DOCTYPE is not always present and might look differently -->
<ONIXmessage> <!-- sometimes with an attribute -->
<header>
...
</header> <!-- up to this line every out-file should be identical to source -->
<product> ... </product>
<product> ... </product>
...
<product> ... </product>
<ONIXmessage>
What I want to do is to split this file into n smaller files of approximately same size. For this I'd count the number of <product>
nodes, divide them by n and clone them into n new xml files. I have searched a lot, and this task seems to be harder than I thought.
- What I could not solve so far is to clone a new XML document with identical xml declaration, doctype, root element and
<header>
node, but without<product>s
. I could do this using regex but I'd rather use xml tools. - What would be the smartest way to transfer a number of
<product>
nodes to a new XML document? Object notation, like$xml.ONIXmessage.product | % { copy... }
,XPath()
queries (can you select n nodes with XPath()?) andCloneNode()
orXMLReader
/XMLWriter
? - The content of the nodes should be identical regarding formatting and encoding. How can this be ensured?
I'd be very grateful for some nudges in the right direction!