Trying to Parse Images and Text from an RSS Feed

Question

This is a continuation of the thread here: Trying to Parse Only the Images from an RSS Feed

This time I want to parse both Images and Certain Items from an RSS feed. A Sampling of the RSS feed looks like this:

 <channel>
 <atom:link href="http://mywebsite.com/rss" rel="self" type="application/rss+xml" />

 <item>
 <title>Article One</title>
 <guid isPermaLink="true">http://mywebsite.com/details/e8c5106</guid>
 <link>http://mywebsite.com/geturl/e8c5106</link>
 <comments>http://mywebsite.com/details/e8c5106#comments</comments>     
 <pubDate>Wed, 09 Jan 2013 02:59:45 -0500</pubDate> 
 <category>Category 1</category>    
 <description>
      <![CDATA[<div>
      <img src="http://mywebsite.com/myimages/1521197-main.jpg" width="120" border="0"  />  
      <ul><li>Poster: someone's name;</li>
      <li>PostDate: Tue, 08 Jan 2013 21:49:35 -0500</li>
      <li>Rating: 5</li>
      <li>Summary:Lorem ipsum dolor </li></ul></div><div style="clear:both;">]]>
      </description>
 </item> 
 <item>..

I have the following code below where I try to parse image and text:

$xml = simplexml_load_file('http://mywebsite.com/rss?t=2040&dl=1&i=1');

$descriptions = $xml->xpath('//item/description');
$mytitle= $xml->xpath('//item/title');

foreach ( $descriptions as $description_node ) {
   // The description may not be valid XML, so use a more forgiving HTML parser mode
   $description_dom = new DOMDocument();
   $description_dom->loadHTML( (string)$description_node );

   // Switch back to SimpleXML for readability
   $description_sxml = simplexml_import_dom( $description_dom );

   // Find all images, and extract their 'src' param
   $imgs = $description_sxml->xpath('//img');
   foreach($imgs as $image) {
        echo "<img id=poster class=poster src={$image['src']}> {$mytitle}";
        }
    }

The above code extracts the images beautifully.... However, it does not extract the $mytitle (which would be "Article One") tag as I try on the last line of my code. This is supposed to extract from all items in the RSS feed.

Can anyone help me figure this one out please.

Many thanks,

Hernando

The XPath is correct. Perhaps you need to call `->nodeValue` on `$mytitle` to get the node contents. — helderdarocha, Apr 01 '14 at 21:31
Actually, since you have many `item` elements, you will need to use `->item(0)` to get the first one. — helderdarocha, Apr 01 '14 at 21:33
Thanks Helderdarocha... unfortunately my knowledge is not advanced and I can't understand your explanation. Issue is that I have to extract something that is inside the field and something that is outside it in the field. This will repeat several times within the RSS feed, which is what I do want. — Hernandito, Apr 02 '14 at 15:46

dirkk · Accepted Answer · 2014-04-03T14:48:58.817

xpath() always returns an array (see http://www.php.net/manual/en/simplexmlelement.xpath.php), even if just one element is the result. If you know you will expect one element, you can simply use $mytitle[0].

You will have to iterate over each <item/> element, as otherwise you can't know which description and which title belong together. So the following should work:

$xml = simplexml_load_file('test.xml');

$items = $xml->xpath('//item');

foreach ( $items as $item) {
  $descriptions = $item->description;
  $mytitle = $item->title;
  foreach ( $descriptions as $description_node ) {
     // The description may not be valid XML, so use a more forgiving HTML parser mode
     $description_dom = new DOMDocument();
     $description_dom->loadHTML( (string)$description_node );

     // Switch back to SimpleXML for readability
     $description_sxml = simplexml_import_dom( $description_dom );

     // Find all images, and extract their 'src' param
     $imgs = $description_sxml->xpath('//img');
     foreach($imgs as $image) {
          echo "<img id=\"poster\" class=\"poster\" src=\"{$image['src']}\"> {$mytitle}";
          }
      }
}

By the way, I also added "" to you your <img/> element. I guess you want that, as this look very much like XML/HTML.

Thank you Dirkk... I think we are getting closer... The RSSS has multiple items that I want to scrape. Each item has an embedded `title` and the image for which my code above works. So in my foreach, I want to scrape and echo the corresponding image and the corresponding `title` for each of the items in the feed. Your code returned the the same title for all the items in the feed. — Hernandito, Apr 03 '14 at 14:26
@Hernandito I updated my answer. You will have to slightly adjust the logic of your program, because otherwise the descriptions and title will always be unrelated to each other. You should iterate over each `item` and then look for the required elements. — dirkk, Apr 03 '14 at 14:50
Dirkk.... it works like a charm!!! 2 days of trial and error trying to fix this. Thank you so very much!!! — Hernandito, Apr 03 '14 at 15:15

Trying to Parse Images and Text from an RSS Feed

1 Answers1