0

I'm trying to handle with php scraping using cURL and Simple Html Dom Parser, but i'm getting stuck while return json format. Website is a free webscraper test website..

function getPage($href) {
    $curl = curl_init();
    curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, FALSE);
    curl_setopt($curl, CURLOPT_HEADER, false);
    curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
    curl_setopt($curl, CURLOPT_URL, $href);
    curl_setopt($curl, CURLOPT_REFERER, $href);
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
    $str = curl_exec($curl);
    $html = str_get_html($str);
    curl_close($curl);
    return $html;
}

$link = 'https://www.webscraper.io/test-sites/e-commerce/allinone/computers';
$data = getPage($link);

foreach ($data->find('div[class=col-sm-4 col-lg-4 col-md-4]') as $key => $finder) {

    $img = $finder->find('img[class=img-responsive]');
    $imgCrt = $img->src;
    $price = $finder->find('h4[class=pull-right price]');
    $priceCrt = $price->innertext;
    $desc = $finder->find('p[class=description]');
    $descCrt = $desc->innertext;

    $json['status'] = 'ok';

    $json['return'][] = [
      'img' => $imgCrt,
      'price' => $priceCrt,
      'desc' => $descCrt
    ];
}

echo json_encode($json);

Result:

{"status":"ok","return":[{"img":null,"price":null,"desc":null},{"img":null,"price":null,"desc":null},{"img":null,"price":null,"desc":null}]}

And errors...

https://i.imgur.com/7scD2Yg.png

Line 43, 45, 47:

43 - $imgCrt = $img->src;
45 - $priceCrt = $price->innertext;
47 - $descCrt = $desc->innertext;

Whitout those lines my result page become blank, with no erros and no json results.. Thanks in advance!

SOLUTION!!

While dumping discovered this:

var_dump($finder->find('img')[0]->src);
echo "<br />";
var_dump($finder->find('h4.price')[0]->innertext);
echo "<br />";
var_dump($finder->find('p.description')[0]->innertext);

Now works like a cham in:

$img[$key] = $finder->find('img')[0]->src;
$price[$key] = $finder->find('h4.price')[0]->innertext;
$desc[$key] = $finder->find('p.description')[0]->innertext;

$json['return'][] = [
  'img' => $img[$key],
  'price' => $price[$key],
  'desc' => $desc[$key]
];

Result: img: https://i.stack.imgur.com/EHGAL.png

Thanks!

Pablo Mariante
  • 350
  • 3
  • 11
  • 2
    Possible duplicate of [Reference - What does this error mean in PHP?](https://stackoverflow.com/questions/12769982/reference-what-does-this-error-mean-in-php) – Nico Haase Jan 28 '19 at 17:16
  • Seems like `$finder->find()` isn't returning an object, this means it could be empty, an array, a string, etc etc – GrumpyCrouton Jan 28 '19 at 17:17
  • 1
    See what each scalar or vector is in your PHP like this: `echo '
    '.print_r($img->src, TRUE).'
    ';` etc... It should tell you the type of value it is (array, stdClass object etc...)
    – Shaun Bebbers Jan 28 '19 at 17:18
  • 1
    @ShaunBebbers Or just `...print_r($img, true)...`, it may give a better idea of what it actually contains as well – GrumpyCrouton Jan 28 '19 at 17:19
  • If you have an `array` for our `$img` variable, use `$img['src']` for instance, depending on the debug of its contents – Shaun Bebbers Jan 28 '19 at 17:19

3 Answers3

0

Is $imgCrt = $img->src; an object or array?

try $imgCrt = $img['src'];

NaijaProgrammer
  • 2,892
  • 2
  • 24
  • 33
Chris
  • 256
  • 1
  • 10
0

If you are using PHP 7, once you have confirmed what type of scalar or vector your variable is, you could do something like this:

$imgCrt = $img['src'] ?? $img->src;

Translated, this is:

$imgCrt = is_array($img) && !empty($img['src']) ? $img['src'] : $img->src;

This assumes that your key is src in your $img variable.

Please see my comments for how to debug and see what values and value types.

Also remember to set a HTTP response code -> http://php.net/manual/en/function.http-response-code.php

Shaun Bebbers
  • 179
  • 2
  • 12
0

Your aren't finding any elements in your ->find call, that is why you're getting those errors.
The Simple html parser uses CSS selectors in the find method, the attribute you're searching for has spaces in it therefore it must be quoted.
Also find returns an array unless you specify an index

foreach ($data->find('div["class=col-sm-4 col-lg-4 col-md-4"]') as $key => $finder) {

    $img = $finder->find('img[class=img-responsive]', 0);
    $imgCrt = $img->src;
    $price = $finder->find('h4[class="pull-right price"]', 0);
    $priceCrt = $price->innertext;
    $desc = $finder->find('p[class=description]', 0);
    $descCrt = $desc->innertext;

    $json['status'] = 'ok';

    $json['return'][] = [
      'img' => $imgCrt,
      'price' => $priceCrt,
      'desc' => $descCrt
    ];
}
Musa
  • 96,336
  • 17
  • 118
  • 137