1

Before you shoot me down, give me a minute. I've looked in SO for the answer - here's the problem

I have an external XML/RDF file which must be parsed with roughly this structure

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
 xmlns:rss="http://purl.org/rss/1.0/"
 xmlns:os="http://a9.com/-/spec/opensearch/1.1/"
 xmlns:dc="http://purl.org/dc/elements/1.1/"
 xmlns:dcterms="http://purl.org/dc/terms/"
 xmlns:bibo="http://purl.org/ontology/bibo/">

 <rss:channel rdf:about="http://domain.com/feed/">
  <rss:link rdf:resource="http://domain.com/feed/items.rss" />
  <rss:title>Search Results</rss:title>
  <os:startIndex>0</os:startIndex>
  <os:itemsPerPage>10</os:itemsPerPage>
  <os:totalResults>13</os:totalResults>
  <rss:items rdf:resource="urn:unique-identifier" />
 </rss:channel>

 <rss:item rdf:about="http://domain.com/items/123456">
  <rss:link>http://domain.com/items/123456</rss:link>
  <rss:title>Book Title</rss:title>
  <rss:description>Random Book Description</rss:description>
  <dc:creator>First Name Last Name, 1901</dc:creator>
  <dcterms:language rdf:datatype="http://purl.org/dc/terms/ISO639-2">eng</dcterms:language>
  <dc:format>Book</dc:format>
  <dc:publisher>London : Publisher</dc:publisher>
  <dc:date>2009</dc:date>
  <bibo:isbn>1234567890</bibo:isbn>
  <bibo:eanucc13>1234567890</bibo:eanucc13>
  <dcterms:identifier>1234567890</dcterms:identifier>
 </rss:item>
</rdf:RDF>

Right so that's the XML file. Here's what I know

  1. I can loop the feed to get the numbers
  2. Using a file_get_contents($var) I get this error

    Warning: simplexml_load_file(): I/O warning : failed to load external entity

  3. I can't use foreach($rss->item as $item) because the item itself has a colon in place.

  4. I've tried replacing the colons with underscores and the error from #2 arises.
  5. I've tried a DOM method mentioned somewhere in Stack Overflow.
  6. I've tried SimpleXML method mentioned on Stack Overflow.

All I want to do is loop the rss:items and extract the items underneath.

Any help would be genuinely really appreciated as I'm tearing my hair out and I'm out of coffee!

Thanks so much,

Martin

P.S. To the person who marked this as a duplicate, I understand your reasoning but I couldn't understand the answers in other threads, so I had to ask a new one. Thanks for your patience, I'm new to the community.

The thread Simple XML - Dealing With Colons In Nodes did not deal with the fact that the top tag wasn't parsable by foreach

foreach ($feed->item as $item)

In this feed $feed->item doesn't exist as it's $feed->rss::item which is invalid syntax. Thanks.

Community
  • 1
  • 1

1 Answers1

1

The colon separates an namespace prefix from the local node name. This is an alias references the xmlns:rss definition. So a name like rss:channel can be read as {http://purl.org/rss/1.0/}:channel.

To read XML with namespace with the DOMXpath object, you need to register your own prefixes. This way the prefixes in the Xpath expression can be resolved.

$dom = new DOMDocument();
$dom->loadXml($xml);
$xpath = new DOMXpath($dom);
$xpath->registerNamespace('rss', 'http://purl.org/rss/1.0/');

$result = [];
foreach ($xpath->evaluate('//rss:item') as $item) {
  $result = [
    'title' => $xpath->evaluate('string(rss:title)', $item),
    'link' => $xpath->evaluate('string(rss:link)', $item)
  ];
}

var_dump($result);

Output: https://eval.in/173016

array(2) {
  ["title"]=>
  string(10) "Book Title"
  ["link"]=>
  string(30) "http://domain.com/items/123456"
}
ThW
  • 19,120
  • 3
  • 22
  • 44
  • Thanks so much for taking the time to reply. I will give this a go and get back to you. Much appreciated! –  Jul 30 '14 at 08:35
  • This worked a treat once I used file_get_content($feed_url) to grab my XML/RSS file. Thank you! –  Aug 01 '14 at 16:49