3

Update: the site I'm scraping has provided me with another way to get the data I need, so I am no longer having this problem. However I am still interested in a solution for educational reasons.


I want to use cURL to pass data to a JSON server via POST.

I'm not sure how to format the variables for the cURL.

this doesn't work (returns null):

define('POSTVARS', 'json='.urlencode('{"cid":"2623","strQuery":"","strValues":"undefined","currentPage":"0","pageSize":"-1","pageSort":"-1","countryId":"2","maxResultCount":""}'));

nor does this

define('POSTVARS', "json=%7B'cid'%3A'2623'%2C%20'strQuery'%3A''%2C%20'strValues'%3A'undefined'%2C%20'currentPage'%3A'0'%2C%20'pageSize'%3A'-1'%2C'pageSort'%3A'-1'%2C'countryId'%3A'2'%2C'maxResultCount'%3A''%7D=");

As per Post JSON using Curl I am setting the header with

curl_setopt($ch, CURLOPT_HTTPHEADERS,array('Content-Type: application/json'));

My script works for regular POST variables, but not when I try to pass data to a JSON server, and I am wondering if I have misformatted the data or must provide some other additional parameter.

full code of my attempt (formatting is kind of broken below, I pasted the full script here https://docs.google.com/document/d/1hokE6-oMtcs3MBgPUzPwJvZIWNABc3XjOO2IzbpxDF4/edit?hl=en&authkey=CKXcnv8C ):



function get_url( $url, $javascript_loop = 0, $timeout = 5 ) { $cookie = tempnam ("/tmp", "CURLCOOKIE"); $ch = curl_init(POSTURL); curl_setopt( $ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3) Gecko/20041001 Firefox/0.10.1" ); curl_setopt( $ch, CURLOPT_URL, $url ); curl_setopt( $ch, CURLOPT_COOKIEJAR, $cookie ); curl_setopt( $ch, CURLOPT_FOLLOWLOCATION, true ); curl_setopt( $ch, CURLOPT_ENCODING, "" ); curl_setopt( $ch, CURLOPT_RETURNTRANSFER, true ); curl_setopt( $ch, CURLOPT_AUTOREFERER, true ); curl_setopt( $ch, CURLOPT_SSL_VERIFYPEER, false ); # required for https urls curl_setopt( $ch, CURLOPT_CONNECTTIMEOUT, $timeout ); curl_setopt( $ch, CURLOPT_TIMEOUT, $timeout ); curl_setopt( $ch, CURLOPT_MAXREDIRS, 10 );

$postvars=<<<HEREDOC
{'cid':'2623', 'strQuery':"", 'strValues':'undefined', 'currentPage':'0', 'pageSize':'-1','pageSort':'-1','countryId':'2','maxResultCount':''}

HEREDOC;

curl_setopt($ch, CURLOPT_POST      ,1);
curl_setopt($ch, CURLOPT_POSTFIELDS    ,$postvars);
curl_setopt($ch, CURLOPT_HEADER      ,0);  // DO NOT RETURN HTTP HEADERS 

$arr = array();
array_push($arr, 'Content-Type: application/json; charset=utf-8');
curl_setopt($ch, CURLOPT_HTTPHEADER, $arr);

$content = curl_exec( $ch );
$response = curl_getinfo( $ch );
curl_close ( $ch );

if ($response['http_code'] == 301 || $response['http_code'] == 302)
{
    ini_set("user_agent", "Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3) Gecko/20041001 Firefox/0.10.1");

    if ( $headers = get_headers($response['url']) )
    {
        foreach( $headers as $value )
        {
            if ( substr( strtolower($value), 0, 9 ) == "location:" )
                return get_url( trim( substr( $value, 9, strlen($value) ) ) );
        }
    }
}

if (    ( preg_match("/>[[:space:]]+window\.location\.replace\('(.*)'\)/i", $content, $value) || preg_match("/>[[:space:]]+window\.location\=\"(.*)\"/i", $content, $value) ) &&
        $javascript_loop < 5
)
{
    return get_url( $value[1], $javascript_loop+1 );
}
else
{
    return array( $content, $response );
}

}

// set url $url="http://us.asos.com/services/srvWebCategory.asmx/GetWebCategories";

$output = get_url($url);

print_r($output);

Community
  • 1
  • 1
jela
  • 1,449
  • 3
  • 23
  • 30
  • are you calling curl_setopt with CURLOPT_POSTFIELDS? – Dereleased Feb 10 '11 at 21:31
  • @ceejayoz yes, that's where I got the idea to convert the data into querystring format (from Jordan's response), but without success, so there must be something else I am doing wrong. @Dereleased yes, my script works for sending data via post to a web form, but not to a JSON server, so it appears my problem is associated with how I am formatting the variables, or something else specific to posting to a JSON server. – jela Feb 10 '11 at 21:36
  • Have you tried json_encode for your data? – DeaconDesperado Feb 15 '11 at 18:22
  • @DeaconDesperado my data is already in the format {"cid":"2623","strQuery":"","strValues":"undefined","currentPage":"0","pageSize":"-1","pageSort":"-1","countryId":"2","maxResultCount":""} so unless I have made a mistake, it seems already to be encoded properly? – jela Feb 16 '11 at 22:13
  • Could we see a full code sample? – A. R. Younce Feb 23 '11 at 17:19
  • @A. R. Younce sure -- I have now posted it above, also, to preserve formatting (doesn't look so great above), I posted the script here https://docs.google.com/document/d/1hokE6-oMtcs3MBgPUzPwJvZIWNABc3XjOO2IzbpxDF4/edit?hl=en&authkey=CKXcnv8C note that as I added to the top of the OP, the site I'm scraping helpfully provided me with another way to get the data I require, so I am no longer in need of a solution for practical purposes, but I would still like to educate myself on what I have done wrong, especially if it is something stupid, which I imagine to be the case. – jela Feb 25 '11 at 20:49
  • Running the code myself the script gets a 401 (Unauthorized) response from the server. Is this what you were encountering? If you had the error condition posted before I don't see it now. – A. R. Younce Feb 25 '11 at 23:59
  • yes, the ouput I get from the script is: `Array ( [0] => {"Message":"There was an error processing the request.","StackTrace":"","ExceptionType":""} [1] => Array ( [url] => http://us.asos.com/services/srvWebCategory.asmx/GetWebCategories [content_type] => application/json;charset=utf-8 [http_code] => 401` The request works when viewing a web page that requests the same data, for example http://us.asos.com/New-clothing-The-latest-fashion-clothing/rsraf/?cid=2623&pge=0&pgesize=-1&sort=-1#parentID=-1&pge=0&pgeSize=-1&sort=-1, but not when posting the variables directly – jela Feb 26 '11 at 16:28

0 Answers0