0

I am trying to pull the latest 4 news items from this site here: http://www.wolverinegreen.com/sports/m-wrestl/spec-rel/utva-m-wrestl-spec-rel.html

They have no rss feed, so I have been reading into using php preg_match function but the syntax is a little confusing and I am not sure exactly how to do it. Any suggestions would be truly appreciated or if there is a more efficient method that I have not thought of then I am open to ideas.

StujayH
  • 69
  • 1
  • 9

1 Answers1

1
// Get the page's HTML
$html = file_get_contents("http://www.wolverinegreen.com/sports/m-wrestl/spec-rel/utva-m-wrestl-spec-rel.html");

// Create a DOMDocument object and load the html into it
$dom = new DOMDocument();
$dom->loadHTML($html);

// Create an XPath object using the DOMDocument
$xpath = new DOMXPath($dom);

// Query for the a link using xpath
$items = $xpath->query("//td[1]/div/div[1]/a");

// If we find something using that query
if($items->length)
{
    // Output each item
    foreach($items as $item)
        echo $item->nodeValue . " - " . $item->getAttribute("href") . "<br />";
}
Jake N
  • 10,535
  • 11
  • 66
  • 112
  • in addition to that read this - http://stackoverflow.com/questions/11064980/php-curl-vs-file-get-contents use CURL to get the site content. – Arrok Feb 04 '14 at 12:43
  • thanks Jake N - when I run the code I am getting the php spew a lot of warnings (htmlParseEntityRef: no name in Entity or expecting ';') - I read that it might be because the html on the page I am trying to pull contains an error, is there a way to get around this? – StujayH Feb 04 '14 at 13:01
  • Before loading the HTML try `libxml_use_internal_errors(TRUE);` – Jake N Feb 04 '14 at 13:07
  • thanks for all your help so far, the errors disappear, but no content gets displayed unfortunately. – StujayH Feb 04 '14 at 13:15
  • What do you get if you do `var_dump($items->length);` before the if and foreach at the end? – Jake N Feb 04 '14 at 13:26
  • that works, displays the news items and dates, in addition I get a 'int(398)' before the first one – StujayH Feb 04 '14 at 13:49
  • Cool so it works, the `int(398)` is the count. Perhaps you had a caching issue. Remove the `var_dump` and try again – Jake N Feb 04 '14 at 14:09
  • Thanks so much, it is working how it should now. – StujayH Feb 04 '14 at 14:14