0

Using CURL and Simple HTML DOM Parser to get content from website. Getting response in object. I'm using it to get the link of all images from the product page of this website https://www.geekbuying.com/ It works with most pages, this one for example https://www.geekbuying.com/item/eufy-MACH-V1-Cordless-Vacuum-Cleaner-520574.html

With other pages, which are in fact identical, it doesn't get anything and I just can't figure out why. This one for example https://www.geekbuying.com/item/eufy-Clean-G40-Hybrid--Robot-Vacuum-Cleaner-520591.html

include "simple_html_dom.php";
$link = "https://www.geekbuying.com/item/eufy-Clean-G40-Hybrid--Robot-Vacuum-Cleaner-520591.html"; //don't works

$link = "https://www.geekbuying.com/item/eufy-MACH-V1-Cordless-Vacuum-Cleaner-520574.html"; //works


function get_content($url)
{
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    $htmlContent = curl_exec($ch);
    curl_close($ch);
    $dom = new simple_html_dom();
    $dom->load($htmlContent);
    foreach($dom->find('img') as $element){
        $immagine = $element->src;
        echo "$immagine <br />";
    }
}

get_content($link);

The script should allow me to get the link of all the images in an external page, but with some it doesn't work.

RiggsFolly
  • 93,638
  • 21
  • 103
  • 149
  • 2
    Maybe they dont want you scraping their site without permission – RiggsFolly May 30 '23 at 16:02
  • Hi, I don't think this is the reason. The pages have the exact same structure, only the contents that draw from the database change. – GaraGulp May 30 '23 at 16:05
  • 2
    So what debugging have you done? "just don't get anything" isn't a useful description for us. Do you mean you get no HTML returned at all from the page? Or just that you can't find the specific items you want? Have you checked the raw response to the cURL request? It wasn't really clear from the description what you've narrowed the issue down to. – ADyson May 30 '23 at 16:24
  • To restate things, you can generally have a cURL problem or an XML problem, but rarely (or probably never) both. – Chris Haas May 30 '23 at 17:27
  • sites are huge could be out of memory issue, enable error reporting. i hate myself for saying this but you could use regex for such simple "parse" – Kazz May 30 '23 at 18:55
  • Maybe I expressed myself badly. The structure of the product page is always the same https://www.geekbuying.com/item/eufy-MACH-V1-Cordless-Vacuum-Cleaner-520574.html the contents change according to the product being viewed. With most pages the script fetches all the urls of the images. With some pages, on the other hand, it gets absolutely nothing. – GaraGulp May 30 '23 at 18:57
  • "The script" is the general PHP as a whole. What we are asking you is if cURL is returning empty data, or if the problem is in simple XML. The posted code pipes data pretty much directly from cURL to HTML processing without checking the contents. – Chris Haas May 30 '23 at 19:02
  • If I write echo "$htmlContent"; for most pages it prints all the html, for those where the error is present it prints nothing. Thanks for your help. – GaraGulp May 30 '23 at 19:17
  • Okay, so we can rule out the XML/HTML processing completely and focus solely on the cURL. You should add some code along the lines of `if(!$htmlContent){/*Whatever you need to do, such as throw an Exception*/}` – Chris Haas May 30 '23 at 20:06
  • You should be inspecting the HTTP readers returned, it is possible that you are being given more information about the problem. See this: https://stackoverflow.com/a/9183272/231316 – Chris Haas May 30 '23 at 20:08

0 Answers0