You want to have all leaf nodes with the attribute code being "a" and its parents tag attribute being 200:
//*[not(*) and @code="a" and ../@tag=200]
For that the element names (and therefore as well their namespace) do not matter:
$nodes = simplexml_load_string($notices->biblio->metadata[0]->metadata)
->xpath('//*[not(*) and @code="a" and ../@tag=200]')
;
Handling the Name(space)s
Your original XPath expression
//datafield[@tag="200"]/subfield[@code="a"]/text()
has the issue referencing the wrong elements by using names in the default namespace while the elements have a namespace URI:
<record ... xmlns="http://www.loc.gov/MARC21/slim" ... >
<!-- xmlns=" << Namespace-URI >> " -->
In your expression, XPath requires you to name elements by its QName 1 for which elements not in the default namespace have a prefix for their namespace (which you need to register here first, as there is no prefix specified in the XML).
Note: For a SimpleXMLElement::xpath($expression)
, the existing namespace prefixes are automatically registered (which is very convenient). However, if the element has a namespace-URI, but no prefix, this is naturally not the case, as there is no prefix in the document which could be used (the prefix is "empty").
There are a couple of options to deal with it of which I'll pick out two:
- Prefixing the
<record>
element to have a prefix in the XPath Expression.
- Writing the XPath Expression without QNames but name and namespace information.
Directly provide a Prefix in SimpleXML
In this situation one quick hack is to give the element a prefix so that it has one which is then available in the XPath expression. Then element names can be used in the expression by their QName.
This is convenient as it allows to keep the XPath Expression dense.
Example:
$sxe = simplexml_load_string($notices->biblio->metadata[0]->metadata);
dom_import_simplexml($sxe)->prefix = '_';
$nodes = $sxe->xpath('//_:datafield[@tag=200]/_:subfield[not(*) and @code="a"]');
Setting the prefix = '_'
directly allows to use it in the XPath expression. This is only necessary (and only works) if that element has a xmlns="<< Namespace-URI >>"
.
Another benefit of this hack here is that you don't need to know or care which URI it is to "register" the namespace prefix for the XPath expression so it often suffices for early iterations.
Use the namespaced names in the XPath Expression
The alternative XPath Expression without prefixes but (namespaced) names is a bit longer but more portable as it does not require you to register or deal with prefixes.
If you know the Namespace-URI upfront (which is normally the case as you have the document), prefixes in the XPath Expression can become cumbersome as they need to be registered with the expression. So while they are working, they require setup and still are some kind of hack.
But the XPath expression can be written in a way so that it does not use prefixes and still matches element names in their namespace.
The namespace-uri()
and local-name()
functions are for that, which then allows to write the XPath Expression without the need to relate to QNames for which the prefix would be need to be known. Example:
$nodes = simplexml_load_string($notices->biblio->metadata[0]->metadata)
->xpath('
//* [namespace-uri() = "http://www.loc.gov/MARC21/slim"
and local-name() = "datafield"
and @tag=200
]
/* [namespace-uri() = "http://www.loc.gov/MARC21/slim"
and local-name() = "subfield"
and not(*)
and @code="a"
]
');
Yes, this requires more typing, but it gives you greater flexibility. The short variant I have given upfront in the answer completely ignoring element names (//*[not(*) and @code="a" and ../@tag=200]
). You can do mix and match here, e.g. using only the local-name()
test on one of the elements and so on. To make the differences more visible, I chose this second, very large variant last to show what I'd consider the most opposite way so it reveals all the options you have. The short one might have been a good starting point but perhaps hard to read as you actually want to see (local) names in the expression.
SimpleXMLElement and the XPath text()
Node-Test
Finally, an additional note unrelated to element names and their namespaces, but in your original expression and it might be good to know about it when using a valid XPath 1.0 Expression in context of SimpleXML.
With a SimpleXMLElement::xpath($expression)
, every text()
node-test localization/match results in its parent element in the returned PHP array()
. This is because on the level of SimpleXMLElement
text-nodes are abstracted away. Casting the element to string
gives you the value you're looking for. Therefore I left the node-test out in my examples, and you can leave it out, too. Other XML APIs behave differently here, so I think it is good to know about it.
This should go without saying that the XML of yours is well-fitting for SimpleXML, each leaf-node represents its contents and does not require dedicated text-node handling for which you would have needed to lean on DOMXPath
2 for XPath expressions then (also see dom_import_simplexml()
use above).