1

I am trying to access a list of nodes without namespace declaration within nodes with namespace declaration. My XML file has a main node with namespace ehd with two subnodes header, body within the same namespace. However, all subnodes within the body node have no further namespace declaration. I am struggling with accessing these nodes with SimpleXML.

Excerpt from the xml file:

<?xml version="1.0" encoding="ISO-8859-15"?>
<ehd:ehd ehd_version="1.40" xmlns:ehd="urn:ehd/001" xmlns="urn:ehd/go/001">
    <ehd:header>
    </ehd:header>
    <ehd:body>
        <gnr_liste>
            <gnr V="01100"></gnr>
            <gnr V="01101"></gnr>
            <gnr V="01102"></gnr>
        </gnr_liste>
</ehd:body>
</ehd:ehd>

My code is as follows:

$xml = simplexml_load_file($file) or die("Failed to load");   
    $ehd = $xml->children('ehd', true)->body;
    simplexml_dump($ehd);
    $gnr_liste = $ehd->children('gnr_liste')->children('gnr');
    simplexml_dump($gnr_liste);

The output is:

SimpleXML object (1 item)
[
    Element {
        Namespace: 'urn:ehd/001'
        Namespace Alias: 'ehd'
        Name: 'ehd'
        String Content: ''
        Content in Namespace ehd
            Namespace URI: 'urn:ehd/001'
            Children: 2 - 1 'body', 1 'header'
            Attributes: 0
        Content in Default Namespace
            Children: 0
            Attributes: 1 - 'ehd_version'
    }
]
SimpleXML object (1 item)
[
    Element {
        Namespace: 'urn:ehd/001'
        Namespace Alias: 'ehd'
        Name: 'body'
        String Content: ''
        Content in Default Namespace
            Namespace URI: 'urn:ehd/go/001'
            Children: 1 - 1 'gnr_liste'
            Attributes: 0
    }
]

How do I access all gnr items from the gnr_liste node?

Note: I am using simplexml_dump for debugging

miken32
  • 42,008
  • 16
  • 111
  • 154
chross
  • 511
  • 2
  • 4
  • 13

2 Answers2

2

Personally, I find DomDocument much more intuitive to work with – once you get over the barrier of XPath syntax. No matter what tool you use, namespaces are going to make everything more difficult though!

$xml = <<< XML
<?xml version="1.0" encoding="ISO-8859-15"?>
<ehd:ehd ehd_version="1.40" xmlns:ehd="urn:ehd/001" xmlns="urn:ehd/go/001">
    <ehd:header>
    </ehd:header>
    <ehd:body>
        <gnr_liste>
            <gnr V="01100"></gnr>
            <gnr V="01101"></gnr>
            <gnr V="01102"></gnr>
        </gnr_liste>
</ehd:body>
</ehd:ehd>
XML;

$dom = new DomDocument;
$dom->loadXML($xml);
$xp = new DomXPath($dom);
// need to get tricky due to namespaces https://stackoverflow.com/a/16719351/1255289
$nodes = $xp->query("//*[local-name()='gnr']/@V");
foreach ($nodes as $node) {
    printf("%s\n", $node->value);
}

Output:

01100
01101
01102
miken32
  • 42,008
  • 16
  • 111
  • 154
  • Work well. Thank you for this. I've got one more problem now. My file is about 70 MB in size and would need some sort of streaming based parsing. Do you have any recommendation to speed up processing? – chross Apr 12 '19 at 07:05
  • 1
    For parsing large XMLs take a look at https://stackoverflow.com/a/54107472/2265374. @miken32 here is no need to get tricky, just register your own prefixes for your Xpath expressions: https://3v4l.org/1jYim – ThW Apr 12 '19 at 09:25
  • @ThW interesting, I hadn’t thought to try making up my own prefix to take the place of the default. Tried it with an empty string and no luck of course. – miken32 Apr 12 '19 at 13:28
  • 2
    @miken32 It's important to remember that *all* prefixes are "made up", in the sense that they're local to a particular document, or a particular XPath evaluator - the actual namespace is the URI given in the `xlmns` attribute. Once you forget about trying to "borrow" the prefixes used by whoever wrote the file, the default namespace feels less special. – IMSoP Apr 12 '19 at 13:43
  • Prefixes exists to make the document/expression smaller and more readable. Imagine the tags in Clark-Notation (`{namespace-uri}local-name`). https://de.wikipedia.org/wiki/Namensraum_(XML)#Namensraum-Notation_nach_James_Clark http://sabre.io/xml/clark-notation/ – ThW Apr 12 '19 at 18:09
  • 1
    @ThW Indeed, but importantly they do so *within a single context*. Relying on the prefixes of a document you don't control is like relying on the local variable names inside a different function: you might happen to use the same names for the same values, but you need to assign those names yourself. – IMSoP Apr 13 '19 at 14:51
2

The argument to ->children() is always a namespace identifier or local prefix, never the tag name. If these elements were in "no namespace", you would access them with ->children('').

However, the elements with no prefix in this document do not have no namespace - they are in the default namespace, in this case urn:ehd/go/001 (as defined by xmlns="urn:ehd/go/001").

If you use the full namespace identifiers rather than the prefixes (which is also less likely to break if the feed changes), you should be able to access these easily:

$xml = simplexml_load_file($file) or die("Failed to load");   
$ehd = $xml->children('urn:ehd/001')->body;
$gnr_liste = $ehd->children('urn:ehd/go/001')->gnr_liste;
foreach ( $gnr_liste->gnr as $gnr ) {
    simplexml_dump($gnr);
}

You might want to give your own names to the namespaces so you don't have to use the full URIs, but aren't dependent on the prefixes the XML is generated with; a common approach is to define constants:

const XMLNS_EHD_MAIN = 'urn:ehd/001';
const XMLNS_EHD_GNR = 'urn:ehd/go/001';

$xml = simplexml_load_file($file) or die("Failed to load");   
$ehd = $xml->children(XMLNS_EHD_MAIN)->body;
$gnr_liste = $ehd->children(XMLNS_EHD_GNR)->gnr_liste;
foreach ( $gnr_liste->gnr as $gnr ) {
    simplexml_dump($gnr);
}
IMSoP
  • 89,526
  • 13
  • 117
  • 169