Scraping HTML page using PHP ad xpath: alternatives to use absolute paths

Question

I'm scraping this web page ...

https://www.sanita.puglia.it/monitorpo/aslfg/monitorps-web/monitorps/monitorPSperASL.do?codNazionale=160115

.... using PHP and XPath to get the value 10 in the green box under the table named "PO G. TATARELLA-CERIGNOLA".

(NOTE: you could see different value in that page if you try to browse it ... it doesn't matter ..,, it change dinamically .... )

I'm using this PHP code sample to print the value ...

<?php
    ini_set('display_errors', 'On');
    error_reporting(E_ALL);

    $url = 'https://www.sanita.puglia.it/monitorpo/aslfg/monitorps-web/monitorps/monitorPSperASL.do?codNazionale=160115';

    $xpath_for_parsing = '/html/body/div[4]/table/tbody/tr[2]/td[4]/div';


    //#Set CURL parameters: pay attention to the PROXY config !!!!
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
    curl_setopt($ch, CURLOPT_PROXY, '');
    $data = curl_exec($ch);
    curl_close($ch);

    $dom = new DOMDocument();
    @$dom->loadHTML($data);

    $xpath = new DOMXPath($dom);

    $colorWaitingNumber = $xpath->query($xpath_for_parsing);
    $theValue =  'N.D.';
    foreach( $colorWaitingNumber as $node )
    {
      $theValue = $node->nodeValue;
    }

    print $theValue;
?>

and all works fine.

I'm quite a newbie using XPath: I'd like to avoid to use absolute path like

/html/body/div[4]/table/tbody/tr[2]/td[4]/div

but to use something like

'//*[div="cRiga3 boxtriageS"]'

(NOTE: I know that it does not work but it's only to explain me .... )

Any suggestion or example for this case?

Thank you in advance

EDIT: This question is quite different respect Extract string in HTML page using scraping in PHP ad xpath: in that question my original code didn't work ... now I've fixed it and it works. My question now is how to improve it trying to use a more compact form in my XPath and don't use abosolute path

try `PHPQuery` its much easier to use the `Xpath` documentation sux though — ArtisticPhoenix, Dec 03 '17 at 17:01
My problem is that I've more than one element with the same classname and I don't know to select the single ones — Cesare, Dec 03 '17 at 17:41
If your code works but you'd like to optimize it, perhaps CodeReview would be a better suited audience. — mickmackusa, Dec 07 '17 at 00:01

Scraping HTML page using PHP ad xpath: alternatives to use absolute paths

0 Answers0