9

I'm trying to parse a JSON in PHP using jsonpath ....

My JSON is coming from this

https://servizionline.sanita.fvg.it/tempiAttesaService/tempiAttesaPs

(it's quite too long to cut/paste here but you can see it in a browser session ....)

The JSON is a valid JSON (I've verified it using https://jsonlint.com/ ... ).

I've tried the jsonpath expression using http://www.jsonquerytool.com/ and all seems works fine, but when I put all in my PHP code sample below ....

<?php  
    ini_set('display_errors', 'On');
    error_reporting(E_ALL);

    require_once('json.php');      // JSON parser
    require_once('jsonpath-0.8.0.php');  // JSONPath evaluator

    $url = 'https://servizionline.sanita.fvg.it/tempiAttesaService/tempiAttesaPs';

    $ch = curl_init();
    curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
    curl_setopt($ch, CURLOPT_PROXY, '');
    $data = curl_exec($ch);
    curl_close($ch);

    $parser = new Services_JSON(SERVICES_JSON_LOOSE_TYPE);
    $o = $parser->decode($data);

    $xpath_for_parsing = '$..aziende[?(@.descrizione=="A.S.U.I. - Trieste")]..prontoSoccorsi[?(@.descrizione=="Pronto Soccorso e Terapia Urgenza Trieste")]..dipartimenti[?(@.descrizione=="Pronto Soccorso Maggiore")]..codiciColore[?(@.descrizione=="Bianco")]..situazionePazienti..numeroPazientiInAttesa';

    $match1 = jsonPath($o, $xpath_for_parsing);
    //print_r($match1);
    $match1_encoded = $parser->encode($match1);
    print_r($match1_encoded);

    $match1_decoded = json_decode($match1_encoded);

    //print_r($match1_decoded);

    if ($match1_decoded[0] != '') {
     return  $match1_decoded[0];
    }
    else {
     return  "N.D.";
   } 
?>

... no values are printed .. only a "false" value.

Something goes wrong in my jsonpath expression when I put it in my PHP code: ths error that coming out is the follow

Warning: Missing argument 3 for JsonPath::evalx(), called in /var/www/html/OpenProntoSoccorso/Test/jsonpath-0.8.0.php on line 84 and defined in /var/www/html/OpenProntoSoccorso/Test/jsonpath-0.8.0.php on line 101

Notice: Use of undefined constant descrizione - assumed 'descrizione' in /var/www/html/OpenProntoSoccorso/Test/jsonpath-0.8.0.php(104) : eval()'d code on line 1

Probably I've to escape / quoting my jsonpath to use it in PHP but I don't know how ... any suggestion is appreciated ...

NOTE: I need to use jsonpath expressions like ?(@.descrizione=="A.S.U.I. - Trieste") and I can't use "positional" json path ...

I've also tried to use jsonpath-0.8.3.php coming from here https://github.com/ITS-UofIowa/jsonpath/blob/master/jsonpath.php, but nothing change ...

Suggestions?

Thank you in advance ...

Cesare
  • 1,629
  • 9
  • 30
  • 72
  • 1
    Now that reads like an oversight in the jsonpath-0.8.0 library. Really just had a short look, the code is somewhat terse and hodgepodge, the eval perhaps questionable (albeit ok IMO for this use case). Notably you will have to debug the library and its eval section in particular. You could try e.g. `@.'descripizione'=="ASUI..."` - but I doubt this would fix more than the notice. (The libraries´ tokenization might even choke on that though.) – mario Aug 20 '17 at 04:14
  • 1
    If the easy json-path approach is leading nowhere; you will have to write a recursive function/foreach combo to extract the right attributes/tree. Perhaps try `RecursiveArrayIterator` or so. – mario Aug 20 '17 at 04:16

4 Answers4

3

you can use json_decode to convert it to a native php array, then you can use hhb_xml_encode ( from https://stackoverflow.com/a/43697765/1067003 ) to convert the array to an xml, then you can use DOMDocument::loadHTML to convert the XML to a DOMDocument, then you can use DOMXPath::query to search through it with XPaths...

example:

<?php
declare(strict_types = 1);
header ( "content-type: text/plain;charset=utf8" );
require_once ('hhb_.inc.php');
$json_raw = (new hhb_curl ( '', true ))->exec ( 'https://servizionline.sanita.fvg.it/tempiAttesaService/tempiAttesaPs' )->getStdOut ();
$parsed = json_decode ( $json_raw, true );
// var_dump ( $parsed );
$xml = hhb_xml_encode ( $parsed );
// var_dump($xml);
$dom = @DOMDocument::loadHTML ( $xml );
$dom->formatOutput = true;
$xp = new DOMXPath ( $dom );
$elements_for_parsing = $xp->query ( '//aziende/descrizione[text()=' . xpath_quote ( 'A.S.U.I. - Trieste' ) . ']|//prontosoccorsi/descrizione[text()=' . xpath_quote ( 'Pronto Soccorso e Terapia Urgenza Trieste' ) . ']|//dipartimenti/descrizione[text()=' . xpath_quote ( 'Pronto Soccorso Maggiore' ) . ']|//codicicolore/descrizione[text()=' . xpath_quote ( 'Bianco' ) . ']|//situazionepazienti|//numeroPazientiInAttesa' );
// var_dump ( $elements_for_parsing,$dom->saveXML() );
foreach ( $elements_for_parsing as $ele ) {
    var_dump ( $ele->textContent );
}

// based on https://stackoverflow.com/a/1352556/1067003
function xpath_quote(string $value): string {
    if (false === strpos ( $value, '"' )) {
        return '"' . $value . '"';
    }
    if (false === strpos ( $value, '\'' )) {
        return '\'' . $value . '\'';
    }
    // if the value contains both single and double quotes, construct an
    // expression that concatenates all non-double-quote substrings with
    // the quotes, e.g.:
    //
    // concat("'foo'", '"', "bar")
    $sb = 'concat(';
    $substrings = explode ( '"', $value );
    for($i = 0; $i < count ( $substrings ); ++ $i) {
        $needComma = ($i > 0);
        if ($substrings [$i] !== '') {
            if ($i > 0) {
                $sb .= ', ';
            }
            $sb .= '"' . $substrings [$i] . '"';
            $needComma = true;
        }
        if ($i < (count ( $substrings ) - 1)) {
            if ($needComma) {
                $sb .= ', ';
            }
            $sb .= "'\"'";
        }
    }
    $sb .= ')';
    return $sb;
}
function hhb_xml_encode(array $arr, string $name_for_numeric_keys = 'val'): string {
    if (empty ( $arr )) {
        // avoid having a special case for <root/> and <root></root> i guess
        return '';
    }
    $is_iterable_compat = function ($v): bool {
        // php 7.0 compat for php7.1+'s is_itrable
        return is_array ( $v ) || ($v instanceof \Traversable);
    };
    $isAssoc = function (array $arr): bool {
        // thanks to Mark Amery for this
        if (array () === $arr)
            return false;
        return array_keys ( $arr ) !== range ( 0, count ( $arr ) - 1 );
    };
    $endsWith = function (string $haystack, string $needle): bool {
        // thanks to MrHus
        $length = strlen ( $needle );
        if ($length == 0) {
            return true;
        }
        return (substr ( $haystack, - $length ) === $needle);
    };
    $formatXML = function (string $xml) use ($endsWith): string {
        // there seems to be a bug with formatOutput on DOMDocuments that have used importNode with $deep=true
        // on PHP 7.0.15...
        $domd = new DOMDocument ( '1.0', 'UTF-8' );
        $domd->preserveWhiteSpace = false;
        $domd->formatOutput = true;
        $domd->loadXML ( '<root>' . $xml . '</root>' );
        $ret = trim ( $domd->saveXML ( $domd->getElementsByTagName ( "root" )->item ( 0 ) ) );
        assert ( 0 === strpos ( $ret, '<root>' ) );
        assert ( $endsWith ( $ret, '</root>' ) );
        $full = trim ( substr ( $ret, strlen ( '<root>' ), - strlen ( '</root>' ) ) );
        $ret = '';
        // ... seems each line except the first line starts with 2 ugly spaces,
        // presumably its the <root> element that starts with no spaces at all.
        foreach ( explode ( "\n", $full ) as $line ) {
            if (substr ( $line, 0, 2 ) === '  ') {
                $ret .= substr ( $line, 2 ) . "\n";
            } else {
                $ret .= $line . "\n";
            }
        }
        $ret = trim ( $ret );
        return $ret;
    };

    // $arr = new RecursiveArrayIterator ( $arr );
    // $iterator = new RecursiveIteratorIterator ( $arr, RecursiveIteratorIterator::SELF_FIRST );
    $iterator = $arr;
    $domd = new DOMDocument ();
    $root = $domd->createElement ( 'root' );
    foreach ( $iterator as $key => $val ) {
        // var_dump ( $key, $val );
        $ele = $domd->createElement ( is_int ( $key ) ? $name_for_numeric_keys : $key );
        if (! empty ( $val ) || $val === '0') {
            if ($is_iterable_compat ( $val )) {
                $asoc = $isAssoc ( $val );
                $tmp = hhb_xml_encode ( $val, is_int ( $key ) ? $name_for_numeric_keys : $key );
                // var_dump ( $tmp );
                // die ();
                $tmp = @DOMDocument::loadXML ( '<root>' . $tmp . '</root>' );
                foreach ( $tmp->getElementsByTagName ( "root" )->item ( 0 )->childNodes ?? [ ] as $tmp2 ) {
                    $tmp3 = $domd->importNode ( $tmp2, true );
                    if ($asoc) {
                        $ele->appendChild ( $tmp3 );
                    } else {
                        $root->appendChild ( $tmp3 );
                    }
                }
                unset ( $tmp, $tmp2, $tmp3 );
                if (! $asoc) {
                    // echo 'REMOVING';die();
                    // $ele->parentNode->removeChild($ele);
                    continue;
                }
            } else {
                $ele->textContent = $val;
            }
        }
        $root->appendChild ( $ele );
    }
    $domd->preserveWhiteSpace = false;
    $domd->formatOutput = true;
    $ret = trim ( $domd->saveXML ( $root ) );
    assert ( 0 === strpos ( $ret, '<root>' ) );
    assert ( $endsWith ( $ret, '</root>' ) );
    $ret = trim ( substr ( $ret, strlen ( '<root>' ), - strlen ( '</root>' ) ) );
    // seems to be a bug with formatOutput on DOMDocuments that have used importNode with $deep=true..
    $ret = $formatXML ( $ret );
    return $ret;
}

ps, the lines require_once ('hhb_.inc.php'); $json_raw = (new hhb_curl ( '', true ))->exec ( 'https://servizionline.sanita.fvg.it/tempiAttesaService/tempiAttesaPs' )->getStdOut (); are just fetching the url and putting the json in $json_raw (using gzip compressed transfer to speed things up btw), replace it with whatever you want to fetch it into $json_raw , the actual curl library i used is from https://github.com/divinity76/hhb_.inc.php/blob/master/hhb_.inc.php#L477

currently it prints:

string(18) "A.S.U.I. - Trieste"
string(41) "Pronto Soccorso e Terapia Urgenza Trieste"
string(9) "121200:14"
string(10) "181400:254"
string(6) "Bianco"
string(7) "200:292"
string(5) "00:00"
string(24) "Pronto Soccorso Maggiore"
string(7) "3300:15"
string(6) "Bianco"
string(8) "6200:584"
string(5) "00:00"
string(5) "00:00"
string(8) "4100:353"
string(6) "Bianco"
string(7) "100:051"
string(5) "00:00"
string(5) "00:00"
string(7) "1100:15"
string(8) "6402:012"
string(6) "Bianco"
string(7) "402:274"
string(5) "00:00"
string(9) "11900:202"
string(9) "11401:427"
string(6) "Bianco"
string(8) "2102:051"
string(5) "00:00"
string(7) "3300:08"
string(8) "7401:423"
string(6) "Bianco"
string(8) "8402:104"
string(5) "00:00"
string(6) "Bianco"
string(5) "00:00"
string(5) "00:00"
string(5) "00:00"
string(5) "00:00"
string(7) "1100:04"
string(10) "121000:512"
string(6) "Bianco"
string(8) "5400:461"
string(5) "00:00"
string(5) "00:00"
string(5) "00:00"
string(6) "Bianco"
string(5) "00:00"
string(5) "00:00"
string(9) "121200:18"
string(9) "11800:593"
string(6) "Bianco"
string(8) "6401:272"
string(5) "00:00"
string(6) "Bianco"
string(7) "1100:04"
string(5) "00:00"
string(5) "00:00"
string(5) "00:00"
string(7) "2200:05"
string(9) "10801:102"
string(6) "Bianco"
string(8) "8201:166"
string(5) "00:00"
string(8) "3200:071"
string(7) "100:261"
string(6) "Bianco"
string(5) "00:00"
string(5) "00:00"
string(7) "1100:00"
string(9) "151500:26"
string(10) "161301:123"
string(6) "Bianco"
string(8) "9500:434"
string(7) "1100:00"
string(7) "2200:13"
string(6) "Bianco"
string(7) "200:342"
string(5) "00:00"
string(6) "Bianco"
string(7) "1100:24"
string(5) "00:00"
string(5) "00:00"
string(5) "00:00"
string(7) "1100:04"
string(8) "9700:222"
string(10) "171500:582"
string(6) "Bianco"
string(7) "200:512"
string(7) "1100:40"
string(7) "1100:22"
string(6) "Bianco"
string(8) "3100:062"
string(5) "00:00"
string(5) "00:00"
string(5) "00:00"
string(6) "Bianco"
string(5) "00:00"
string(5) "00:00"
string(7) "1100:22"
string(8) "7500:302"
string(6) "Bianco"
string(5) "00:00"
string(5) "00:00"
string(7) "1100:06"
string(6) "Bianco"
string(7) "1100:00"
string(5) "00:00"
string(5) "00:00"

hope that's what you were looking for, i was guessing by the "xpath" you provided.

hanshenrik
  • 19,904
  • 4
  • 43
  • 89
2

xpath is too complicated for your task and overkill in general...

just use the standard json_decode(), get the equivalent PHP object and navigate it using stardard for/while loops and regexes

Also I think your question is misleading, your problem is not parsing a JSON (that's done by json_decode() automatically), your problem is extracting some data from it using xpath.. I suggest to have a refactor of your question showing exactly what goes wrong and what's your intent

If you need to descent into a precise JSON node (or set of nodes), why you don't do it by means of for loops and regexes?

Gianluca Ghettini
  • 11,129
  • 19
  • 93
  • 159
  • Yes my problem is about extracting some data from my JSON using xpath ... I started to use jsonpath in PHP and it works fine but now I've to use more complicated expressions and some troubles are coming out ... I'd like not to re-write / refactor my business if I can. As you can see the json I've to parse it's a quite long one and I've to parse not only that one but others one that could be bigger and the jsonpath approach could help me .... .anyway I can valuate to try to use the standard approach starting from json_decode() – Cesare Aug 21 '17 at 12:01
  • 2
    json_decode() gives you a php object, once you have that you shouldn't have any problem traversing the fields and get what you need. I don't really see the need for xpath... – Gianluca Ghettini Aug 21 '17 at 12:09
  • 1
    @GianlucaGhettini xpaths makes it easy to search through an entire document without knowing (nor even caring about) its exact structure. like, you know that somewhere is a `foobar` with a property named `baz` and the text contains `lal` ? but you're not sure where exactly the foobar element is? `//foobar[@baz and contains(text(),"lal")]` - doing the same on a native php array would be rather difficult, i welcome you to prove me wrong though. – hanshenrik Aug 27 '17 at 13:23
  • 1
    correct but you are over engineering. did you notice what the OP is trying to do? A very simple traversal, and the json structure is known apriori – Gianluca Ghettini Aug 27 '17 at 22:43
2

I've solved changing library implementation for JsonPath: now I use Skyscanner JsonPath implementation (ref. https://github.com/Skyscanner/JsonPath-PHP).

Some troubles on installation (for me, I never used composer before ...), but the skyskanner team supported me (ref. https://github.com/Skyscanner/JsonPath-PHP/issues/6) and now I've this PHP code ....

<?php
    ini_set('display_errors', 'On');
    error_reporting(E_ALL);

    include "./tmp/vendor/autoload.php";

    $url = 'https://servizionline.sanita.fvg.it/tempiAttesaService/tempiAttesaPs';

    //#Set CURL parameters: pay attention to the PROXY config !!!!
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
    curl_setopt($ch, CURLOPT_PROXY, '');
    $data = curl_exec($ch);
    curl_close($ch);

    $jsonObject = new JsonPath\JsonObject($data);

    $jsonPathExpr = "$..aziende[?(@.descrizione==\"A.S.U.I. - Trieste\")]..prontoSoccorsi[?(@.descrizione==\"Pronto Soccorso e Terapia Urgenza Trieste\")]..dipartimenti[?(@.descrizione==\"Pronto Soccorso Maggiore\")]..codiciColore[?(@.descrizione==\"Verde\")]..situazionePazienti..numeroPazientiInAttesa";

    $r = $jsonObject->get($jsonPathExpr);

    //print json_encode($r);

    print json_encode($r[0]);
?>

in ./tmp I've what I obtain from composer

enter image description here

... that works fine and in this way I can do my json query, potentially, without knowing its exact structure

Cesare
  • 1,629
  • 9
  • 30
  • 72
0
<?php // PRINT SI JSON ORIGINAL
define("DIRPATH", dirname($_SERVER["SCRIPT_FILENAME"]) . '/');
define("WEBPATH", 'http://' . $_SERVER['SERVER_ADDR'] . dirname($_SERVER['PHP_SELF']) . '/');
//define("WEBPORT", 'http://' . $_SERVER['SERVER_ADDR'] . ':' . $_SERVER['SERVER_PORT'] . dirname($_SERVER['PHP_SELF']) . '/');
//define("imgpath", DIRPATH . 'image/');
//$png = file_get_contents('iptv.kodi.al/images/');
$jsondata = file_get_contents('https://servizionline.sanita.fvg.it/tempiAttesaService/tempiAttesaPs');
header("Content-type: application/ld+json; charset=utf-8");
    $print = json_decode($jsondata);
    print_r($print);
?>

<?php // PRINT ME KATEGORI
define("DIRPATH", dirname($_SERVER["SCRIPT_FILENAME"]) . '/');
define("WEBPATH", 'http://' . $_SERVER['SERVER_ADDR'] . dirname($_SERVER['PHP_SELF']) . '/');
//define("WEBPORT", 'http://' . $_SERVER['SERVER_ADDR'] . ':' . $_SERVER['SERVER_PORT'] . dirname($_SERVER['PHP_SELF']) . '/');
//define("imgpath", DIRPATH . 'image/');
//$png = file_get_contents('iptv.kodi.al/images/');
$jsondata = file_get_contents('https://servizionline.sanita.fvg.it/tempiAttesaService/tempiAttesaPs');
header("Content-type: application/ld+json; charset=utf-8");
    $print = json_decode($jsondata);
    //print_r($print);
    $items = '';
    // KETU FILLON LISTA
    foreach ($print->{'aziende'} as $item) {
    $items .= '
' . $item->id . '
' . $item->descrizione . '
';
};
?>
<?php echo $items; ?>
t1gor
  • 1,244
  • 12
  • 25
TRC4
  • 9
  • 1
  • 4