0

These are the steps I want to do:

  1. Get the HTML code of http://www.skyscanner.es/ , a search of flights.
  2. Get only some part of that HTML: a specific "span" which has the price.
  3. Operate with it.

This is the PHP code what I do:

    <?php
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_URL, "http://www.skyscanner.es/");
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt ($ch, CURLOPT_POST, 1);
        curl_setopt ($ch, CURLOPT_POSTFIELDS, "from=Bilbao (BIO)&to=Barcelona (BCN)&depdatetext=25/03/2013&sc_returnOrOneWay=2");
        $output = curl_exec($ch);
        curl_close($ch);
        echo $output;
    ?>

But I get a strange string like this:

     ‹¥TkoÚ0ý^‰ÿpTi“ê< t%<¤RuR»U+{}4ñ…X5qf›×Pÿûì$ZõÛ‚Äu¬sî=çú:ýÓ믣Éï‡1¤f!àáû§»Ï#ðHül‚àzr ¿n'÷wù!<Åã/x©1yëõÚ_·}©æÁä[°qY"G«–DŸæ 'ý¢Êf!2=x#CÔívKb FÊ\\ ¡àÐÿ,ùjàdf03d²Íу¤|x7&pì$)UÍ€kI®®:]yKe¸8¼;@àv2y€ª),520h’ Ö`R®!§s3i€ !×Èü~Pòm"m¶ÁXUÝDëBô)!“©dÛÝ‚ª9Ïâ°7³‰æ1ö?à¢|ÑÛø*F3z§ânQ¬ÐðÄîhši¢QñYoJ“§¹’ËŒÅÍqñôž'3Ž‚Y“»œ2ƳyBÔÉ7…îÏ®zÏÐ8I£Ý¡~Ë¿°ja‰RÅÍ››—/m!£BêkähÚ§ÌÛ~nÐEýÐýö´0¬iMw¨¨vkÎLw/ÏêeoæÒ&iA^ôÌ3 §Ë$E÷Þ9Ô=<êØ‘3{uûHµß)gºYMÏî…[1—š.³X¡ †¯Ð¡ý M\¤<³FŽÏÆ•{mŒ™ÇWö0öÆ\{ÞÎNˆ  ­bµ¿nœ\d|œÙ›SôÐöÓhøˆÊÎ0Œ•’Ê2¢a?°°ct¥ÙM'›‰ Z×û/6á~¦úië?®Š%—IÚÃIŠ%h+—@‚òÉöfRAB3Gœ"0®sA·¶Àj+Í€g+*8ûH%ƒwµ”÷°¦ú Ç\ä¦ÒåÊ·¿Aí¨îK÷m-¾vñà-ú¡ 

So, I have not even passed the first step!

I tried to fix it in several ways but I don't know yet what I am doing wrong. I imagine that can be:

Please can anyone help me?

Thanks in advance!

Edited: I've changed the title, is closer to the problem I have now.

Mikel
  • 5,902
  • 5
  • 34
  • 49

2 Answers2

2

It doesn't matter what message is encoded in the body since you're receiving:

HTTP/1.1 405 Method Not Allowed

which means you can't use POST.

If you'll read all the headers of the response you'll see that one of them says:

Allow: GET, HEAD, OPTIONS, TRACE

If you'll remove the two lines:

curl_setopt ($ch, CURLOPT_POST, 1);
curl_setopt ($ch, CURLOPT_POSTFIELDS, "from=Bilbao (BIO)&to=Barcelona (BCN)");

and change:

curl_setopt($ch, CURLOPT_URL, "http://www.skyscanner.es/");

into:

curl_setopt($ch, CURLOPT_URL, "http://www.skyscanner.es/vuelos/bio/bcn/130325/tarifas-de-bilbao-a-barcelona-en-marzo-2013.html");

It'll work.

Checkout the following code:

<?php

    $accept = array(
        'type' => array('application/rss+xml', 'application/xml', 'application/rdf+xml', 'text/xml'),
        'charset' => array_diff(mb_list_encodings(), array('pass', 'auto', 'wchar', 'byte2be', 'byte2le', 'byte4be', 'byte4le', 'BASE64', 'UUENCODE', 'HTML-ENTITIES', 'Quoted-Printable', '7bit', '8bit'))
    );
    $header = array(
        'Accept: '.implode(', ', $accept['type']),
        'Accept-Charset: '.implode(', ', $accept['charset']),
    );
    $encoding = null;
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, "http://www.skyscanner.es/vuelos/bio/bcn/130325/tarifas-de-bilbao-a-barcelona-en-marzo-2013.html?flt=1");
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
//    curl_setopt ($ch, CURLOPT_POST, 1);
//    curl_setopt ($ch, CURLOPT_POSTFIELDS, "from=Bilbao (BIO)&to=Barcelona (BCN)");
    curl_setopt($ch, CURLOPT_HEADER, true);
    curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
    $response = curl_exec($ch);
    curl_close($ch);        
    if (!$response) {
        // error fetching the response
    } else {
        echo $response;
    }
?>
Nir Alfasi
  • 53,191
  • 11
  • 86
  • 129
  • Thank you very much!!! I've tried it but I get a web without flights information. This is the reason why I tried to do a POST query. You can try the result: [example on codepad.viper-7](http://codepad.viper-7.com/COAPjA/55dev?) Any idea? Thanks again – Mikel Feb 13 '13 at 22:57
  • If you'll use Tamper (FF plugin) you'll see that they don't use POST at all! – Nir Alfasi Feb 13 '13 at 23:51
0

I thought that it was using POST method because I get a page whithout prices.

Now I realize that the URL were relatives, so scrips were not loaded. I've add base tag.

[code before]
$result = str_replace("<head>", "<head><base href=\"$skyScannerURL\" />", $response);

Now it has styles and try to load something, but it enter in a bucle, the page is reloaded and the URL has a parameter increasing, it is: ?crty=107

The full code:

$accept = array(
    'type' => array('application/rss+xml', 'application/xml', 'application/rdf+xml', 'text/xml'),
    'charset' => array_diff(mb_list_encodings(), array('pass', 'auto', 'wchar', 'byte2be', 'byte2le', 'byte4be', 'byte4le', 'BASE64', 'UUENCODE', 'HTML-ENTITIES', 'Quoted-Printable', '7bit', '8bit'))
);
$header = array(
    'Accept: '.implode(', ', $accept['type']),
    'Accept-Charset: '.implode(', ', $accept['charset']),
);
$encoding = null;
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://www.skyscanner.es/vuelos/bio/bcn/130325/tarifas-de-bilbao-a-barcelona-en-marzo-2013.html?flt=1");
//curl_setopt($ch, CURLOPT_URL, "http://www.skyscanner.es/flights/bio/bcn/130325/airfares-from-bilbao-to-barcelona-in-march-2013.html?flt=1");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
$response = curl_exec($ch);
curl_close($ch);        
if (!$response) {
    // error fetching the response
} else {
    $skyScannerURL = 'http://www.skyscanner.es/';
    $result = str_replace("<head>", "<head><base href=\"$skyScannerURL\" />", $response);
    echo $result;
}

You can see online here: codepad.viper-7.com

Obvious something is not working well. Thanks again everyone.

Community
  • 1
  • 1
Mikel
  • 5,902
  • 5
  • 34
  • 49