PHP Cannot get url link content using DOM function

Question

I am trying to get url content in ebay and search there for titles. I am using the php html simple dom parser and its function file_get_html. However when I try to print my result, my script freezes. Firstly I am building the urls using some data from a csv then I open the first result from the search and when I try to get the url content, my function failed. The data from the csv file contains some MPN like these:

Here is my code:

$itemsUrl = readCSV(realpath(dirname(__FILE__)) . DS . 'JeepToysEbayIsr.csv');

foreach ($itemsUrl as $itemNumber => $itemUrl) {

print_r($itemNumber . "\n");
//$url = "https://www.ebay.co.uk/sch/i.html?_from=R40&_trksid=p2380057.m570.l1313.TR0.TRC0.H0.X3342827.TRS0&_sacat=0&_nkw=".$itemUrl['MPN'];
//print_r($item);

//$data = get_web_page($url,"\n");

include_once("simple_html_dom.php");

$context = stream_context_create(array(
'http' => array(
    'header' => array('User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; rv:2.2) Gecko/20110201'),
),
));

$url[] =("https://www.ebay.co.uk/sch/i.html?_from=R40&_trksid=p2380057.m570.l1313.TR0.TRC0.H0.X3342827.TRS0&_sacat=0&_nkw=".$itemUrl['MPN']);

foreach($url as $value) {

    preg_match('/https?\:\/\/[^\" ]+/i', $value,$match);
} 
 if (isset($match[0])) {
   $data = file_get_html($match[0], "\n"); 
   print_r($data);
 }

}

Instead of `file_get_html` try to get the html content of the page using `curl`. By using you can get more control over your page request. By setting various params line user agents — Vinay Patil, Nov 06 '19 at 13:37
https://stackoverflow.com/a/14953910/1483629 refer to this link — Vinay Patil, Nov 06 '19 at 13:56

Florian Richard · Accepted Answer · 2019-11-06T13:54:56.887

0

Maybe you can try to call the url with curl or loadHTMLFile and additionally use xpath to get your content like :

$doc = new DOMDocument();
$doc->loadHTMLFile('https://www.myurl.com', LIBXML_NOERROR | LIBXML_NOWARNING);
$xpath = new DOMXpath($doc);
$var = $xpath->query('//div[contains(@class,"theClass")]');

and then :

print_r($var->item(0))

edited Nov 06 '19 at 13:54

answered Nov 06 '19 at 13:49

Florian Richard

80
1
10

However `loadHTMLFile` is not in the `simple_html_dom` file – Ico Vladimirov Nov 06 '19 at 13:52
I also tried with `get_web_page` function but the parsing was difficult then. – Ico Vladimirov Nov 06 '19 at 13:54
Yes it's true but with loadHTMLFile you can also save your html inside a variable `$myHtml = $doc->saveHTML();` and maybe put it inside `file_get_html` ? If they don't work try with curl or without `simple_html_dom` like my exemple ? – Florian Richard Nov 06 '19 at 14:03
`if (isset($match[0])) { $data = get_web_page($match[0], "\n"); //print_r($data); $page = $data['content']; print_r($page); }` This is how I edited it and the html page is ok – Ico Vladimirov Nov 06 '19 at 14:05
Now the challenge is how to get some titles, images, etc from each link content :D – Ico Vladimirov Nov 06 '19 at 14:11

PHP Cannot get url link content using DOM function

1 Answers1