8

I want to merge 2 XML files with the same structure to make one. For example;

Test1.xml

<?xml version="1.0" encoding="UTF-8"?>

<ns:Root
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:ns="urn:TestNamespace"
    xsi:schemaLocation="urn:Test.Namespace Test1.xsd"
    >
    <ns:element1 id="001">
       <ns:element2 id="001.1" order="1">
           <ns:element3 id="001.1.1" />
       </ns:element2>
       <ns:element2 id="001.2" order="2">
           <ns:element3 id="001.1.2" />
       </ns:element2>
    </ns:element1>
</ns:Root>

and Test2.xml

<?xml version="1.0" encoding="UTF-8"?>

<ns:Root
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:ns="urn:TestNamespace"
    xsi:schemaLocation="urn:Test.Namespace Test1.xsd"
    >
    <ns:element1 id="999">
        <ns:element2 id="999.1" order="1">
            <ns:element3 id="999.1.1" />
        </ns:element2>
    </ns:element1>
</ns:Root>

To create

TestOutput.xml

<?xml version="1.0" encoding="UTF-8"?>

<ns:Root
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:ns="urn:TestNamespace"
    xsi:schemaLocation="urn:Test.Namespace Test1.xsd"
    >
    <ns:element1 id="001">
       <ns:element2 id="001.1" order="1">
           <ns:element3 id="001.1.1" />
       </ns:element2>
       <ns:element2 id="001.2" order="2">
           <ns:element3 id="001.1.2" />
       </ns:element2>
    </ns:element1>
    <ns:element1 id="999">
        <ns:element2 id="999.1" order="1">
            <ns:element3 id="999.1.1" />
        </ns:element2>
    </ns:element1>
</ns:Root>

ie one XML file with all the elements from each included.

I found a useful question on StackOverflow, and came up with this;

Merge.xml

<?xml version="1.0"?>

<ns:Root xmlns:xi="http://www.w3.org/2003/XInclude"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:ns="urn:TestNamespace">

    <xi:include href="Test1.xml" parse="xml" xpointer="element(//ns:Root/ns:element1)" />  

    <xi:include href="Test2.xml" parse="xml" xpointer="element(//ns:Root/ns:element1)" />

</ns:Root>

Which I run by doing this (I need to use xmllint for reasons to involved to go into)

xmllint -xinclude Merge.xml

But this does not work, it complains about various thiongs, which seem to relate to xpointer.

parser error : warning: ChildSeq not starting by /1
Merge.xml:7: element include: XInclude error : XPointer evaluation failed: #element(//ns:Root/ns:element1)
Merge.xml:7: element include: XInclude error : could not load Test1.xml, and no fallback was found
parser error : warning: ChildSeq not starting by /1
Merge.xml:9: element include: XInclude error : XPointer evaluation failed: #element(//ns:Root/ns:element1)
Merge.xml:9: element include: XInclude error : could not load Test2.xml, and no fallback was found
<?xml version="1.0"?>
<ns:Root xmlns:xi="http://www.w3.org/2003/XInclude" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ns="urn:TestNamespace">

    <xi:include href="Test1.xml" parse="xml" xpointer="element(//ns:Root/ns:element1)"/>

    <xi:include href="Test2.xml" parse="xml" xpointer="element(//ns:Root/ns:element1)"/>

</ns:Root>

If I omit the xpointer attributes in Merge.xml then I get some sensible output, but it has done more than include the elements I want of course.

Can someone offer some advice as to what I am doing wrong with xpointer please?

Thanks in antcipation.

Nerdio
  • 983
  • 2
  • 11
  • 18
  • If I remove the namespaces, the above works, so this just looks to be an issue with XPointer and how I am dealing with the namespaces – Nerdio May 15 '13 at 13:35
  • The `element()` scheme does not support qualified names (see https://www.w3.org/TR/xptr-element/). A name specified with `element()` must be a NCName and refers to a _single_ element identified with an xs:ID of that name. That's obviously not what you want. – Adrian W Nov 12 '16 at 18:49

3 Answers3

5

I have dabbled with this a bit more, and found plenty of examples on the web that suggest what I am doing is correct.This is now a working version...

<?xml version="1.0"?>

<Root xmlns:xi="http://www.w3.org/2003/XInclude"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:ns="http://testurl.com/now">

    <xi:include href="Test1.xml" xpointer="xmlns(ns=http://testurl.com/now)xpointer(/ns:Root/ns:element1)" parse="xml" />
    <xi:include href="Test2.xml" xpointer="xpointer(//Root/element1)" parse="xml" />

</Root>

This example uses a version of Test1.xml which has namespaces, and Test2.xml which does not.

The output now looks like this....

<?xml version="1.0"?>
<Root xmlns:xi="http://www.w3.org/2003/XInclude" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ns="http://testurl.com/now">

    <ns:element1 xmlns:ns="http://testurl.com/now" id="001">
        <ns:element2 id="001.1" order="1">
            <ns:element3 id="001.1.1"/>
        </ns:element2>
        <ns:element2 id="001.2" order="2">
            <ns:element3 id="001.1.2"/>
        </ns:element2>
    </ns:element1><ns:element1 xmlns:ns="http://testurl.com/now" id="003">
        <ns:element2 id="007.0" order="1">
            <ns:element3 id="007.1.1"/>
        </ns:element2>
    </ns:element1><ns:element1 xmlns:ns="http://testurl.com/now" id="002">
        <ns:element2 id="002.1" order="3">
            <ns:element3 id="002.1.1"/>
        </ns:element2>
        <ns:element2 id="002.2" order="4">
            <ns:element3 id="002.1.2"/>
        </ns:element2>
    </ns:element1>
    <element1 id="999">
        <element2 id="999.1" order="1">
            <element3 id="999.1.1"/>
        </element2>
    </element1>

</Root>

This is of course acceptable, it would be nice if the line breaks between the open and close of element1 were still there

Nerdio
  • 983
  • 2
  • 11
  • 18
  • The line breaks are not part of the elements, so that can't be cought with referencing the elements. Try adding `--pretty 1` to the xmllint command line: `xmllint --pretty 1 -xinclude Merge.xml`. That still will not reproduce the original spacing, but looks a bit nicer. – Adrian W Nov 12 '16 at 19:03
2

This works with and without namespaces:

<?xml version="1.0"?>
<ns:Root xmlns:xi="http://www.w3.org/2003/XInclude"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:ns="urn:TestNamespace">

    <xi:include href="Test1.xml" xpointer="xpointer(*/*)" />  
    <xi:include href="Test2.xml" xpointer="xpointer(*/*)" />

</ns:Root>

Also parse="xml" is default. You don't need to specify it.

Adrian W
  • 4,563
  • 11
  • 38
  • 52
0

For those using Xerces in Java: it only supports xpointer="element(...)" pointers. This is defined at https://www.w3.org/TR/2003/REC-xptr-element-20030325/

It has an example:

For example, the following pointer part identifies the element with an ID (as defined in XPointer Framework) of "intro"

but I failed to understand XPointer Framework determined ID is from https://www.w3.org/TR/2003/REC-xptr-framework-20030325/#shorthand

Scanning through https://www.ibiblio.org/xml/books/bible3/chapters/ch18.html
and reading https://xerces.apache.org/xerces2-j/faq-xinclude.html
I realized that this is possible to achieve what you asked for:

<?xml version="1.0"?>
<ns:Root xmlns:xi="http://www.w3.org/2003/XInclude"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:ns="urn:TestNamespace">

    <xi:include href="Test1.xml" xpointer="element(/1)" />  
    <xi:include href="Test2.xml" xpointer="element(/1)" />

</ns:Root>

The good thing about this is that I guess element() schema is supported in more places than the full xpointer() schema.

Note: this addressing scheme can be nested, so for example /1/2 means root's (/) 1st element that has a child (/) at position 2. So it would select 001.2 from Test1.xml.

TWiStErRob
  • 44,762
  • 26
  • 170
  • 254