6

Below I have a PHP script that I need to search through an XML file and find the ID for <AnotherChild>. For some reason, at the moment it returns 0 results and I can't figure out why. If anyone can see why it's returning 0 results I'd really appreciate it if they could let me know why.

XML:

<TransXChange xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://www.transxchange.org.uk/" xsi:schemaLocation="http://www.transxchange.org.uk/ http://www.transxchange.org.uk/schema/2.1/TransXChange_general.xsd" CreationDateTime="2013-07-12T18:12:21.8122032+01:00" ModificationDateTime="2013-07-12T18:12:21.8122032+01:00" Modification="new" RevisionNumber="3" FileName="swe_44-611A-1-y10.xml" SchemaVersion="2.1">
    <Node1>...</Node1>
    <Node2>...</Node2>
    <Node3>...</Node3>
    <Node4>...</Node4>
    <Node5>...</Node5>
    <Node6>...</Node6>
    <Node7>
        <Child>
            <id>ABCDEFG123</id>
        </Child>
        <AnotherChild>
            <id>ABCDEFG124</id>
        </AnotherChild>
    </Node7>
    <Node8>...</Node8>
</TransXChange>

PHP:

<?php

  $xmldoc = new DOMDocument();
  $xmldoc->load("directory1/directory2/file.xml");

  $xpathvar = new DOMXPath($xmldoc);
  $xpathvar->registerNamespace('transXchange', 'http://www.transxchange.org.uk/');

  $queryResult = $xpathvar->query('//AnotherChild/id');
  foreach($queryResult as $result) {
    echo $result->textContent;
  }
?>

Thanks

jskidd3
  • 4,609
  • 15
  • 63
  • 127
  • possible duplicate of [xpath with namespace](http://stackoverflow.com/questions/9827685/xpath-with-namespace) – Wrikken Aug 08 '13 at 21:29
  • [this one may be better though](http://stackoverflow.com/questions/6475394/php-xpath-query-on-xml-with-default-namespace-binding) – Wrikken Aug 08 '13 at 21:30
  • @Wrikken I've just looked at both of those answers and can't see how I'd adjust my code to fix my issue? – jskidd3 Aug 08 '13 at 21:32
  • 1
    Well, look a little longer... notice your document has a default namespace, notice those answers mention something like `->registerNamespace`... – Wrikken Aug 08 '13 at 21:33
  • @Wrikken Yes but those answers don't explain very well why `->registerNamespace` is needed. I have no idea about 'namespace', maybe could write an answer to explain? :) – jskidd3 Aug 08 '13 at 21:35
  • [this is what namespaces are](http://en.wikipedia.org/wiki/XML_namespace), and save for doing your work for you, I don't think I can explain it that much better than the answers already given... How's about you share us your best effort using `->registerNamespace`? – Wrikken Aug 08 '13 at 21:40
  • @Wrikken I've edited the question with the `->registerNamespace` in it, although there's no change, it still doesn't work. – jskidd3 Aug 08 '13 at 21:47
  • namespace is just a qualifier, so you can differentiate between two schemas that both define the same ElementName. So if you did an xsd for database schemas, that defined an element shema, you wouldn't want it to stuff up xs:schema. A default name space, means you doin't have to keep typing the qualifier. – Tony Hopkinson Aug 08 '13 at 21:49
  • 1
    So, you've defined the namespace. How would you alter the `XPath` to make use of that namespace? Look at the answers to those 2 questions, how do _they_ alter the query? – Wrikken Aug 08 '13 at 21:51
  • @Wrikken Thank you for your time. Although both of the links you posted weren't directly the fix and were slightly harder to follow I managed to modify the query successfully. I appreciate you trying to make me figure it out for myself, but would it not have been easier to simply submit the answer suited to my question/needs instead? :) – jskidd3 Aug 08 '13 at 21:59
  • I do fully appreciate you helping me though :) – jskidd3 Aug 08 '13 at 22:00
  • Simpler, yes. But simpler isn't always better :P Answers figured out stick around longer then answers handed out, and figuring out answers trains you in ... figuring out answers, which is a very useful skill to have ;) – Wrikken Aug 08 '13 at 22:03

2 Answers2

9

The two questions linked in comments do actually answer this question, but they don't quite make it clear enough why they answer it IMO, so I'll add this following my answer in chat.


Consider the following XML document:

<root>
  <child>
    <grandchild>foo</grandchild>
  </child>
</root>

This has no xmlns attributes at all, which means you can query //grandchild and get the result you expect. Every node is in the default namespace, so everything can be addressed without registering a namespace in XPath.

Now consider this:

<root xmlns="http://www.bar.com/">
  <child>
    <grandchild>foo</grandchild>
  </child>
</root>

This declares a namespace of http://www.bar.com/ and as a result you must use that namespace to address a member node.

As you have already figured out, the way to do this is to use DOMXPath::registerNamespace() - but the crucial point that you missed is that (in PHP's XPath implementation) every namespace must be registered with a prefix, and you must use that prefix to address nodes that belong to it. It is not possible register a namespace in XPath with an empty prefix.

So, given the second example above, lets look at how we would execute the original //grandchild query:

<?php

    $doc = new DOMDocument();
    $doc->loadXML($xml);

    $xpath = new DOMXPath($doc);
    $xpath->registerNamespace('bar', 'http://www.bar.com/');

    $nodes = $xpath->query('//bar:grandchild');
    foreach($nodes as $node) {
        // do stuff with $node
    }

Note how we registered the namespace using it's URI, and we specified a prefix. Even though the original XML did not contain this prefix, we use the prefix in the query - example.

To understand why, lets look at another piece of XML:

<baz:root xmlns:baz="http://www.bar.com/">
  <baz:child>
    <baz:grandchild>foo</baz:grandchild>
  </baz:child>
</baz:root>

This document is semantically identical to the second - the code sample would work equally well with either (proof). The prefix is separate from the namespace. Note that even though this uses a baz: prefix in the document, the XPath uses the bar: prefix. This is because the think that identifies the namespace is the URI, not the prefix.

So when a document uses a namespace, we must work with the namespace, not against it, by registering the namespace in XPath and using the prefix we registered it against to refer to any nodes that belong to that namespace.

For completeness, when we apply these principles to your original document, the query that you would use with the code in the question is:

//transXchange:AnotherChild/transXchange:id
Community
  • 1
  • 1
DaveRandom
  • 87,921
  • 11
  • 154
  • 174
  • Thanks for leaving such a great, detailed answer! – jskidd3 Aug 08 '13 at 22:34
  • @JoelKidd No problem, there are a few regulars in the PHP room on chat who are quite familiar with XML and XPath if you have any more queries, hakre [blogs](http://hakre.wordpress.com/) about some of the intricacies periodically if you want to check it out :-) – DaveRandom Aug 08 '13 at 22:41
  • That's awesome! I'll have a read, bookmarked the blog. Thanks again. – jskidd3 Aug 08 '13 at 22:49
2

To fix this problem I first registered the namespace:

$xpathvar->registerNamespace('transXchange', 'http://www.transxchange.org.uk/');

And then modified the query like so:

$queryResult = $xpathvar->query('//transXchange:AnotherChild/transXchange:id');

This returned the ID successfully.

jskidd3
  • 4,609
  • 15
  • 63
  • 127