Simplest way to parse SRU XML?

Question

My Q is: what is the simplest way, in PHP, to parse the XML returned by a SRU request?

For instance, look at the following URL in a browser:

http://explor.bcu.ac.uk/IntraLibrary-SRU?operation=searchRetrieve&query=gaelic&version=1.1

This query to a public repository returns a well-formed XML document (it validates) conforming to the SRU standard, in this case returning two records. I've played with various permutations of simplexml_load_string() and methods of SimpleXMLElement(), running print_r and var_dump, and never get anything usable. For example:

$url = "http://explor.bcu.ac.uk/IntraLibrary-SRU?operation=searchRetrieve&query=gaelic&version=1.1";
$ch = curl_init();
curl_setopt ($ch, CURLOPT_URL,$url);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);   
$file_contents = curl_exec($ch);
$xml = new SimpleXMLElement($file_contents);
print_r($xml);

This just outputs:

SimpleXMLElement Object ( )

If I replace the print_r with:

echo $xml -> asXML();

I at least get the XML data as one long string.

What I'd like to see in print_r is an object/array with all XML nodes displayed, so that I can then see what all the nodes and children are in object notation.

An added wrinkle to this is that the nodes in the returned XML have names such as:

<SRW:recordSchema>dc</SRW:recordSchema>

so I can't use code such as:

if ($xml -> SRW:recordSchema -> children()

as that'll throw a syntax error over the semicolon.

I'm no expert on XML. I understand basic structure, and I've parsed simple XML docs (such as in the PHP manual "Basic SimpleXML examples"), but terms like xPath and namespace go over my head. I've had a look at:

Parse XML (SRU) with php

How can I parse a XML document retrieved from SRU?

http://us3.php.net/SimpleXMLElement

and have Googled for "php parse sru xml". Before I get lost in XML, I'd be grateful if someone could just point me in the right direction.

xml is xml.. doesn't matter what it represents, it has to follow the same basic rules. and there's plenty of examples of how to handle an XML namespace on this site: http://stackoverflow.com/questions/595946/parse-xml-with-namespace-using-simplexml — Marc B, Jun 16 '14 at 16:43
Since it contains many namespaces, and you might not be able to ignore them, I believe XPath would be a good solution. — helderdarocha, Jun 16 '14 at 16:43
Now that's where you've lost me. What are all the namespaces in this XML? Is the namespace that which precedes the colon? — Fred Riley, Jun 16 '14 at 16:57

score 0 · Answer 1 · answered Jun 16 '14 at 19:14

The namespaces are defined in the XML by xmlns in something like xmlns:SRW="http://www.loc.gov/zing/srw/"

The part before the = sign (SRW in the example above) acts as a handle or shorthand to the namespace - this is convenient for readability/writing to avoid writing lots of URLS which are the actual namespaces, the parts after the = sign (http://www.loc.gov/zing/srw/ in the SRW example above).

So the definition of the namespace follows the pattern xmlns:SHORTHAND = URL

in use to qualify elements, the shorthand appears before the element name, separated by a colon e.g. SRW:RecordSchema

Simplest way to parse SRU XML?

1 Answers1