12

A strange thing happened after a supplier changed the XML header a bit. I used to be able to read stuff using xpath, but now I can't even get a reply with

$xml->xpath('/');

They changed it from this...

<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE NewsML SYSTEM "http://www.newsml.org/dl.php?fn=NewsML/1.2/specification/NewsML_1.2.dtd" [
<!ENTITY % nitf SYSTEM "http://www.nitf.org/IPTC/NITF/3.4/specification/dtd/nitf-3-4.dtd">
%nitf;
]>
<NewsML>
...

to this:

<?xml version="1.0" encoding="iso-8859-1"?>
<NewsML
  xmlns="http://iptc.org/std/NewsML/2003-10-10/"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://iptc.org/std/NewsML/2003-10-10/ http://www.iptc.org/std/NewsML/1.2/specification/NewsML_1.2.xsd http://iptc.org/std/NITF/2006-10-18/   http://contentdienst.pressetext.com/misc/nitf-3-4.xsd"
>
...
Dave Vogt
  • 18,600
  • 7
  • 42
  • 54

1 Answers1

25

Most likely this is because they've introduced a default namespace (xmlns="http://iptc.org/std/NewsML/2003-10-10/") into their document. SimpleXML's support for default namespaces is not very good, to put it mildly.

Can you try to explicitly register a namespace prefix:

$xml->registerXPathNamespace("n", "http://iptc.org/std/NewsML/2003-10-10/");
$xml->xpath('/n:NewsML');

You would have to adapt your XPath expressions to use the "n:" prefix on every element. Here is some additional info: http://people.ischool.berkeley.edu/~felix/xml/php-and-xmlns.html.

EDIT: As per the spec:

The registerXPathNamespace() function creates a prefix/ns context for the next XPath query.

This means it would have to be called before every XPath query, thus a function to wrap XPath queries would be the natural thing to do:

function simplexml_xpath_ns($element, $xpath, $xmlns)
{
    foreach ($xmlns as $prefix_uri)
    {
        list($prefix, $uri) = explode("=", $prefix_uri, 2);
        $element->registerXPathNamespace($prefix, $uri);
    }
    return $element->xpath($xpath);
}

Usage:

$xmlns = ["n=http://iptc.org/std/NewsML/2003-10-10/"];
$result = simplexml_xpath_ns($xml, '/n:NewsML', $xmlns);
Tomalak
  • 332,285
  • 67
  • 532
  • 628
  • 1
    Thanks a bunch! Also, it seems you have to register the namespaces before every single xpath request.. if you could add this to your answer, it'd be perfect ;) – Dave Vogt Dec 09 '08 at 09:18
  • Thanks... that fixed my XPATH queries of my Google Earth (KML) file. Which would only work if I remove the xmlns attribute from my XML file. –  Dec 26 '09 at 14:51
  • Why is registering namespaces necessary? Why aren't namespaces just treated as attributes on the root element? – Jake Wilson Jan 26 '15 at 22:37
  • @Jakobud Because namespaces *aren't* just attributes, they change the entire element. Think of them as… *colors*. Thought experiment: `` is default (black). `` sets a new default. Now the entire element (and its descendants!) are red. Now, `xpath('//foo')` will only select black (that's how it works). You must make it recognize red with `registerXPathNamespace("red", "rgb(255,0,0)")`. Now you can do `xpath('//red:foo')`. (Note: `` *leaves the element at the default* but introduces the prefix `red` for later use, i.e. ``.) – Tomalak Jan 27 '15 at 06:46