0

I'm scraping this web page ...

https://www.sanita.puglia.it/monitorpo/aslfg/monitorps-web/monitorps/monitorPSperASL.do?codNazionale=160115

enter image description here

.... using PHP and XPath to get the value 10 in the green box under the table named "PO G. TATARELLA-CERIGNOLA".

(NOTE: you could see different value in that page if you try to browse it ... it doesn't matter ..,, it change dinamically .... )

I'm using this PHP code sample to print the value ...

<?php
    ini_set('display_errors', 'On');
    error_reporting(E_ALL);

    $url = 'https://www.sanita.puglia.it/monitorpo/aslfg/monitorps-web/monitorps/monitorPSperASL.do?codNazionale=160115';

    $xpath_for_parsing = '/html/body/div[4]/table/tbody/tr[2]/td[4]/div';


    //#Set CURL parameters: pay attention to the PROXY config !!!!
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
    curl_setopt($ch, CURLOPT_PROXY, '');
    $data = curl_exec($ch);
    curl_close($ch);

    $dom = new DOMDocument();
    @$dom->loadHTML($data);

    $xpath = new DOMXPath($dom);

    $colorWaitingNumber = $xpath->query($xpath_for_parsing);
    $theValue =  'N.D.';
    foreach( $colorWaitingNumber as $node )
    {
      $theValue = $node->nodeValue;
    }

    print $theValue;
?>

and all works fine.

I'm quite a newbie using XPath: I'd like to avoid to use absolute path like

/html/body/div[4]/table/tbody/tr[2]/td[4]/div

but to use something like

'//*[div="cRiga3 boxtriageS"]'

(NOTE: I know that it does not work but it's only to explain me .... )

Any suggestion or example for this case?

Thank you in advance

EDIT: This question is quite different respect Extract string in HTML page using scraping in PHP ad xpath: in that question my original code didn't work ... now I've fixed it and it works. My question now is how to improve it trying to use a more compact form in my XPath and don't use abosolute path

Cesare
  • 1,629
  • 9
  • 30
  • 72

0 Answers0