I'm trying to handle with php scraping using cURL and Simple Html Dom Parser, but i'm getting stuck while return json format. Website is a free webscraper test website..
function getPage($href) {
$curl = curl_init();
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($curl, CURLOPT_HEADER, false);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_URL, $href);
curl_setopt($curl, CURLOPT_REFERER, $href);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
$str = curl_exec($curl);
$html = str_get_html($str);
curl_close($curl);
return $html;
}
$link = 'https://www.webscraper.io/test-sites/e-commerce/allinone/computers';
$data = getPage($link);
foreach ($data->find('div[class=col-sm-4 col-lg-4 col-md-4]') as $key => $finder) {
$img = $finder->find('img[class=img-responsive]');
$imgCrt = $img->src;
$price = $finder->find('h4[class=pull-right price]');
$priceCrt = $price->innertext;
$desc = $finder->find('p[class=description]');
$descCrt = $desc->innertext;
$json['status'] = 'ok';
$json['return'][] = [
'img' => $imgCrt,
'price' => $priceCrt,
'desc' => $descCrt
];
}
echo json_encode($json);
Result:
{"status":"ok","return":[{"img":null,"price":null,"desc":null},{"img":null,"price":null,"desc":null},{"img":null,"price":null,"desc":null}]}
And errors...
Line 43, 45, 47:
43 - $imgCrt = $img->src;
45 - $priceCrt = $price->innertext;
47 - $descCrt = $desc->innertext;
Whitout those lines my result page become blank, with no erros and no json results.. Thanks in advance!
SOLUTION!!
While dumping discovered this:
var_dump($finder->find('img')[0]->src);
echo "<br />";
var_dump($finder->find('h4.price')[0]->innertext);
echo "<br />";
var_dump($finder->find('p.description')[0]->innertext);
Now works like a cham in:
$img[$key] = $finder->find('img')[0]->src;
$price[$key] = $finder->find('h4.price')[0]->innertext;
$desc[$key] = $finder->find('p.description')[0]->innertext;
$json['return'][] = [
'img' => $img[$key],
'price' => $price[$key],
'desc' => $desc[$key]
];
Result: img: https://i.stack.imgur.com/EHGAL.png
Thanks!