0

I am trying to make a webscraper for a danish wine-site.

But I am having some troubles getting results out of it. I think it is in the Xpath-portion my problem is, as I can se from my debugging that it is omitting some strings, but I am not sure.

$title = $ScrapedPageXpath->query('*<h3>');

It could also be that my query is wrong.

I am not a skilled programmer, and this is the first thing I have ever tried to make, so please bear that in mind in your replies.

Below is my code:

<?php

function curlGet($url)
{
    $chandle = curl_init();
    curl_setopt($chandle, CURLOPT_URL, $url);
    curl_setopt($chandle, CURLOPT_RETURNTRANSFER, TRUE);
    curl_setopt($chandle, CURLOPT_CONNECTTIMEOUT, 5);
    curl_setopt($chandle, CURLOPT_FOLLOWLOCATION, 5);
    $curlResults = curl_exec($chandle);
    curl_close($chandle);

    return $curlResults;
}

$Winelist = array();

function returnXPathObject($item)
{
    $xmlPageDom = new DomDocument();
    @$xmlPageDom->loadHTML($item);
    $xmlPageXPath = new DOMXPath($xmlPageDom);

    return $xmlPageXPath;
}

$ScrapedPage = curlGet('http://www.vinhit.dk/shop/');

$ScrapedPageXpath = returnXPathObject($ScrapedPage);

$title = $ScrapedPageXpath->query('*<h3>');
if ($title->length > 0) {
    $Winelist['title'] = $title->item(0)->nodeValue;

}
print_r($Winelist);
hakre
  • 193,403
  • 52
  • 435
  • 836
  • 2
    Please ALLWAYS include actual and expected outputs and above all the exact error message when asking a question on StackOverflow. – ToBe Mar 17 '15 at 11:59
  • Even though you formulate a problem description here, it's not technically a programming question. You can see this by the answer it produced: It looks helpful to you because it makes your code work now. However, the actual point you were missing is that your PHP code already told you where it went wrong (and even why). It's just you have not seen it. Read on here on how to enable error messages: [**How to get useful error messages in PHP?**](http://stackoverflow.com/q/845021/367456) – hakre Mar 19 '15 at 09:12

1 Answers1

0

Your query is not a valid XPath expression. To get all <h3> nodes the query should've been :

//h3

For further reference about XPath :

har07
  • 88,338
  • 12
  • 84
  • 137
  • @kasperLorentsen See the link for what `//` means in XPath, and heed the comment from @ToBe for your next posts here. Good luck! – har07 Mar 17 '15 at 12:08