3

I'm trying to return the values of elements of an XML that I receive from the database

the XML in the database looks like this

<?xml version="1.0" encoding="UTF-8"?>
<record
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd"
    xmlns="http://www.loc.gov/MARC21/slim">

  <leader>00524nam a2200145Ia 4500</leader>
  <controlfield tag="001">25</controlfield>
  <controlfield tag="008">200930s9999  xx      000 0 und d</controlfield>
  <datafield tag="090" ind1=" " ind2=" ">
    <subfield code="a">25</subfield>
  </datafield>
  <datafield tag="100" ind1=" " ind2=" ">
    <subfield code="a">20220914              frey50        </subfield>
  </datafield>
  <datafield tag="101" ind1=" " ind2=" ">
    <subfield code="a">fre</subfield>
  </datafield>
  <datafield tag="200" ind1=" " ind2=" ">
    <subfield code="a">Etude sur les métiers -emplois de l'environnement pour la promotion de l'emploi environnemental comme appui a l'INDH</subfield>
    <subfield code="e">synthèse du rapport principal</subfield>
  </datafield>
  <datafield tag="210" ind1=" " ind2=" ">
    <subfield code="c">DES MINES ,DE L'EAU ET DE L'ENVIRONNEMENT</subfield>
  </datafield>
  <datafield tag="215" ind1=" " ind2=" ">
    <subfield code="a">33 p.</subfield>
  </datafield>
  <datafield tag="610" ind1=" " ind2=" ">
    <subfield code="a">ACTEURS;ENVIRENNEMENT;EMPLOI</subfield>
  </datafield>
  <datafield tag="676" ind1=" " ind2=" ">
    <subfield code="a">331.34</subfield>
  </datafield>
</record>

to get the datafield with tag "200" and its subfield with code "a"

$xml_string = simplexml_load_string($notices->biblio->metadata[0]->metadata);

$nodes = $xml_string->xpath('//datafield[@tag="200"]/subfield[@code="a"]/text()');

I tested the XPATH reeformatter.com and it works perfectly, but when I try to return the nodes I get an empty array. I tried to remove text() but unfortunately it didn't work as well, i tried all possibilities and nothing worked.

Kossay Rhafiri
  • 87
  • 1
  • 10

4 Answers4

1

Your are probably better off confronting the namespaces in your xml head on:

$xml_string->registerXPathNamespace("xxx", "http://www.loc.gov/MARC21/slim");
$node = $xml_string->xpath('//xxx:datafield[@tag="200"]/xxx:subfield[@code="a"]/text()')[0];
echo $node;

Output:

Etude sur les métiers -emplois de l'environnement pour la promotion de l'emploi environnemental comme appui a l'INDH
Jack Fleeting
  • 24,385
  • 6
  • 23
  • 45
0

You xpath was correct the problem is / was the namespace inside your xml.

Found this snippet somewhere deep in some php.net answers.

$notices->biblio->metadata[0]->metadata = str_replace('xmlns=', 'ns=', $notices->biblio->metadata[0]->metadata);

After that you can call your xpath to get the desired node:

$simplexml = simplexml_load_string($notices->biblio->metadata[0]->metadata);
$nodes = $simplexml->xpath('//datafield[@tag="200"]/subfield[@code="a"]/text()');
var_dump($nodes);die();

You might want to consider the OOP approach using SimpleXMLElement

$simplexml = New SimpleXMLElement($notices->biblio->metadata[0]->metadata);
$nodes = $simplexml->xpath('//datafield[@tag="200"]/subfield[@code="a"]/text()');
var_dump($nodes);die();

But to be honest, don't know why. Maybe some in the comments can tell me if there is any value using simple_xml_load instead of SimpleXmlElement.

mhaendler
  • 124
  • 5
  • 1
    While you might consider `str_replace()` to be "working" in your case, it is neither appropriate nor necessary when the XML is valid and could be already loaded by the parser. Often the contrary, your suggestion has a good potential to impact the string that much it will destroy the data, it's structure or contents. This is perhaps similar not known to you for the why like the other part of the code you say you also may consider but don't know why. Please refrain to the existing Q&A material first for your own questions, best before publishing the answer. Alternatively use comments. – hakre Sep 16 '22 at 16:07
0

You can iterate over the elements and get the appropriate level. I don't use xpaths much so not sure what the issue there is.

$xml = new simplexmlelement('<record
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd"
    xmlns="http://www.loc.gov/MARC21/slim">

  <leader>00524nam a2200145Ia 4500</leader>
  <controlfield tag="001">25</controlfield>
  <controlfield tag="008">200930s9999  xx      000 0 und d</controlfield>
  <datafield tag="090" ind1=" " ind2=" ">
    <subfield code="a">25</subfield>
  </datafield>
  <datafield tag="100" ind1=" " ind2=" ">
    <subfield code="a">20220914              frey50        </subfield>
  </datafield>
  <datafield tag="101" ind1=" " ind2=" ">
    <subfield code="a">fre</subfield>
  </datafield>
  <datafield tag="200" ind1=" " ind2=" ">
    <subfield code="a">Etude sur les métiers -emplois de l\'environnement pour la promotion de l\'emploi environnemental comme appui a l\'INDH</subfield>
    <subfield code="e">synthèse du rapport principal</subfield>
  </datafield>
  <datafield tag="210" ind1=" " ind2=" ">
    <subfield code="c">DES MINES ,DE L\'EAU ET DE L\'ENVIRONNEMENT</subfield>
  </datafield>
  <datafield tag="215" ind1=" " ind2=" ">
    <subfield code="a">33 p.</subfield>
  </datafield>
  <datafield tag="610" ind1=" " ind2=" ">
    <subfield code="a">ACTEURS;ENVIRENNEMENT;EMPLOI</subfield>
  </datafield>
  <datafield tag="676" ind1=" " ind2=" ">
    <subfield code="a">331.34</subfield>
  </datafield>
</record>');
foreach ($xml->datafield as $data) {
    if ($data['tag'] == 200) {
        foreach ($data->subfield as $sub) {
            if ($sub['code'] == "a") {
                echo $sub;
            }
        }
    }
}
hakre
  • 193,403
  • 52
  • 435
  • 836
user3783243
  • 5,368
  • 5
  • 22
  • 41
0

You want to have all leaf nodes with the attribute code being "a" and its parents tag attribute being 200:

//*[not(*) and @code="a" and ../@tag=200]

For that the element names (and therefore as well their namespace) do not matter:

$nodes = simplexml_load_string($notices->biblio->metadata[0]->metadata)
    ->xpath('//*[not(*) and @code="a" and ../@tag=200]')
    ;

Handling the Name(space)s

Your original XPath expression

//datafield[@tag="200"]/subfield[@code="a"]/text()

has the issue referencing the wrong elements by using names in the default namespace while the elements have a namespace URI:

<record ... xmlns="http://www.loc.gov/MARC21/slim" ... >
<!--        xmlns="  <<     Namespace-URI    >>  "   -->

In your expression, XPath requires you to name elements by its QName 1 for which elements not in the default namespace have a prefix for their namespace (which you need to register here first, as there is no prefix specified in the XML).


Note: For a SimpleXMLElement::xpath($expression), the existing namespace prefixes are automatically registered (which is very convenient). However, if the element has a namespace-URI, but no prefix, this is naturally not the case, as there is no prefix in the document which could be used (the prefix is "empty").


There are a couple of options to deal with it of which I'll pick out two:

  1. Prefixing the <record> element to have a prefix in the XPath Expression.
  2. Writing the XPath Expression without QNames but name and namespace information.

Directly provide a Prefix in SimpleXML

In this situation one quick hack is to give the element a prefix so that it has one which is then available in the XPath expression. Then element names can be used in the expression by their QName.

This is convenient as it allows to keep the XPath Expression dense. Example:

$sxe = simplexml_load_string($notices->biblio->metadata[0]->metadata);
dom_import_simplexml($sxe)->prefix = '_';
$nodes = $sxe->xpath('//_:datafield[@tag=200]/_:subfield[not(*) and @code="a"]');

Setting the prefix = '_' directly allows to use it in the XPath expression. This is only necessary (and only works) if that element has a xmlns="<< Namespace-URI >>".

Another benefit of this hack here is that you don't need to know or care which URI it is to "register" the namespace prefix for the XPath expression so it often suffices for early iterations.

Use the namespaced names in the XPath Expression

The alternative XPath Expression without prefixes but (namespaced) names is a bit longer but more portable as it does not require you to register or deal with prefixes.

If you know the Namespace-URI upfront (which is normally the case as you have the document), prefixes in the XPath Expression can become cumbersome as they need to be registered with the expression. So while they are working, they require setup and still are some kind of hack.

But the XPath expression can be written in a way so that it does not use prefixes and still matches element names in their namespace. The namespace-uri() and local-name() functions are for that, which then allows to write the XPath Expression without the need to relate to QNames for which the prefix would be need to be known. Example:

$nodes = simplexml_load_string($notices->biblio->metadata[0]->metadata)
    ->xpath('
    //* [namespace-uri() = "http://www.loc.gov/MARC21/slim" 
         and local-name() = "datafield" 
         and @tag=200
        ]
        
        /* [namespace-uri() = "http://www.loc.gov/MARC21/slim"
            and local-name() = "subfield" 
            and not(*) 
            and @code="a"
           ]
    ');

Yes, this requires more typing, but it gives you greater flexibility. The short variant I have given upfront in the answer completely ignoring element names (//*[not(*) and @code="a" and ../@tag=200]). You can do mix and match here, e.g. using only the local-name() test on one of the elements and so on. To make the differences more visible, I chose this second, very large variant last to show what I'd consider the most opposite way so it reveals all the options you have. The short one might have been a good starting point but perhaps hard to read as you actually want to see (local) names in the expression.


SimpleXMLElement and the XPath text() Node-Test

Finally, an additional note unrelated to element names and their namespaces, but in your original expression and it might be good to know about it when using a valid XPath 1.0 Expression in context of SimpleXML.

With a SimpleXMLElement::xpath($expression), every text() node-test localization/match results in its parent element in the returned PHP array(). This is because on the level of SimpleXMLElement text-nodes are abstracted away. Casting the element to string gives you the value you're looking for. Therefore I left the node-test out in my examples, and you can leave it out, too. Other XML APIs behave differently here, so I think it is good to know about it.

This should go without saying that the XML of yours is well-fitting for SimpleXML, each leaf-node represents its contents and does not require dedicated text-node handling for which you would have needed to lean on DOMXPath 2 for XPath expressions then (also see dom_import_simplexml() use above).


hakre
  • 193,403
  • 52
  • 435
  • 836