0

It's reply on this question, I not have much reputation to post there YQL: html table is no longer supported

I had the same problem, yesterday.
I think Yahoo not return this feature(I think after buying them Verizon). I made few scripts used YQL on site where not work php, and I was disappointed when after three days all scripts falled. I learned about yql only a week ago(I thought it's great feature), If I knew that it would be no longer supported, I would have thought of another variant(I think big company must warned before, and then remove functionality).
I had a CORS problem in the work with javascript, when need extract some data from another site(usualy I use curl for extract html). So I solved the problem in another way. write simple php xpath parser, and posted on the my website. You can extract part html via GET request called via ajax. You can also add json return, I not made, because I not needed this feature.

It's php backend code:

//gethtmldata.php backend file
<? 
header("Access-Control-Allow-Origin: https://allowsite.com");
//header('Access-Control-Allow-Origin: *'); //all
//echo 'hello';
$url = $_GET['url'];
$xpathQuery = $_GET['xpath'];
/*
//diagnostic block
echo '<pre>';
echo $_SERVER['HTTP_USER_AGENT'].PHP_EOL;
echo $_SERVER['REMOTE_HOST'].PHP_EOL;
echo $_SERVER['REMOTE_ADDR'].PHP_EOL;
echo $url.PHP_EOL;
echo $xpathQuery.PHP_EOL;
echo '</pre>';
*/
//need more hard check for security, I made only basic
if( isset($_GET['url']) && isset($_GET['xpath'])){
    function check($target_url){
        $check = curl_init();
        //curl_setopt( $check, CURLOPT_HTTPHEADER, array("REMOTE_ADDR: $ip", "HTTP_X_FORWARDED_FOR: $ip"));
        //curl_setopt($check, CURLOPT_INTERFACE, "xxx.xxx.xxx.xxx");
        curl_setopt($check, CURLOPT_COOKIEJAR, 'cookiemon.txt');
        curl_setopt($check, CURLOPT_COOKIEFILE, 'cookiemon.txt');
        curl_setopt($check, CURLOPT_TIMEOUT, 40000);
        curl_setopt($check, CURLOPT_RETURNTRANSFER, TRUE);
        curl_setopt($check, CURLOPT_URL, $target_url);
        curl_setopt($check, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
        curl_setopt($check, CURLOPT_FOLLOWLOCATION, false);

        $tmp = curl_exec ($check);
        curl_close ($check);

        return $tmp;

    } 
    $html = check($url);
    $dom = new DOMDocument();
    @$dom->loadHTML($html);
    $xpath = new DOMXPath($dom);

    $elements = $xpath->query($xpathQuery);

    $temp_dom = new DOMDocument();
    foreach($elements as $n) $temp_dom->appendChild($temp_dom->importNode($n,true));
    $renderedHtml = $temp_dom->saveHTML();

    echo '<result>';
        print_r($renderedHtml);
    echo '</result>';
}
?>

Call via javascript in your project

//js frontend

<div id="part-html-container"></div>
<script>
        var source = "https://www.your-target-site.com/";
        var xpath = '//div[contains(@class,"product-rating")]'; //standart xpath query
        var clean_space = source.replace(/ /g, "%20");

$.ajax({
    type: 'GET',
    url: "https://site-where-script-situated.com/embedding/gethtmldata.php?url="+encodeURIComponent(clean_space)+"&xpath="+encodeURIComponent(xpath)+"",
    dataType: 'html',
    success: function(data){
        $('#part-html-container').html(data);
        console.log('success');
    }
});
</script>

So, you need: - server or hosting with php; - ssl (if your project site use it); - this php script

Vadim K
  • 9
  • 2

0 Answers0