0

I have tried getting the contents only of the <div class="start-teaser"> from this rss feed with the script above, tried with xpath, like this:

$xpath = new DOMXPath($html); $desc = $xpath->query("//*[@class='start-teaser']");

But it's not taking it. And I don't understand why. I also tried doing smth like this:

$desc = $html->getElementsByTagName('p')->item(0)->getAttribute('class');

But this returns only the class name. And i need the contents (text) of that div no the class name.

public function NewsRss() {
$rss = new DOMDocument();
$rss->load('http://www.autoexpress.co.uk/feeds/all');
$feed = array();
foreach ($rss->getElementsByTagName('item') as $node) {
  $htmlStr = $node->getElementsByTagName('description')->item(0)->nodeValue;
  $html = new DOMDocument();        
  $html->loadHTML($htmlStr);
  $xpath = new DOMXPath($html);
  $desc = $xpath->query("//*[@class='start-teaser']");
  $imgTag = $html->getElementsByTagName('img');
  $img = ($imgTag->length==0)?'noimg.png':$imgTag->item(0)->getAttribute('src');
  $item = array (
    'title' => $node->getElementsByTagName('title')->item(0)->nodeValue,
    //'desc' => $node->getElementsByTagName('description')->item(0)->nodeValue,
    'desc' => $desc,
    'link' => $node->getElementsByTagName('link')->item(0)->nodeValue,
    'date' => $node->getElementsByTagName('pubDate')->item(0)->nodeValue,
'image' => $img,
  );
  array_push($feed, $item);
}
$limit = 3;
for($x=0;$x<$limit;$x++) {
  $title = str_replace(' & ', ' &amp; ', $feed[$x]['title']);
  $link = $feed[$x]['link'];
  $description = $feed[$x]['desc'];
  $date = date('l F d, Y', strtotime($feed[$x]['date']));
  echo '<div class="news-row-index">';
  echo '<div class="img"><a href="'.$link.'" target="_blank" title="'.$title.'"><img src="'.$feed[$x]['image'].'" height="79" width="89"></a></div>';
  echo '<div class="details-index"><p><h5><a href="'.$link.'" target="_blank" title="'.$title.'">'.$title.'</a></h5><br />';
  echo '<small><em>Posted on '.$date.'</em></small></p>';
  echo '<p>'.$feed[$x]['desc'].'</p></div>';
  echo '</div>';
}
echo '<a style="margin-left:10px;" class="view-all-but" target="_blank" href="http://www.autoexpress.co.uk/feeds/all">View all</a>';
}
Hashem Qolami
  • 97,268
  • 26
  • 150
  • 164
user3140607
  • 303
  • 4
  • 19

1 Answers1

1

The class value is short-teaser, not start-teaser; so use //*[@class='short-teaser'] instead.

For matching HTML classes, also take this question into account: How can I match on an attribute that contains a certain string?

Community
  • 1
  • 1
Jens Erat
  • 37,523
  • 16
  • 80
  • 96
  • ah did not noticed the class, changed it but still not working right: http://pastebin.com/Ye1ssbcc. It now echo's only 1 feed instead of 3 it was. – user3140607 Jan 26 '14 at 12:54
  • also tried the ones from that question: `$desc = $xpath->query("//*[contains(concat(' ', normalize-space(@class), ' '), ' short-teaser')]");` none of them work. – user3140607 Jan 26 '14 at 13:11
  • Apart from a bunch of PHP warnings I cannot reproduce your issue -- I'm getting `$limit` news items. By the way, better loop over all items in `$feed`, but `break` after `$limit` items, did you think of what will happen if the site only has two items? – Jens Erat Jan 26 '14 at 13:41
  • I have hidden them with display none. This would be temporary until i find exactly how to take the contents of a div for my case. – user3140607 Jan 28 '14 at 09:44