Before someone says something or "jump," I would like to say that I have read:
- Any idea on how to scrape pages which are behind __doPostBack('...');?
- DotNetNuke, PHP, Simulating a remote postback using curl
- Scraping HTML with JavaScript postbacks
- cURL post data to asp.net page
- http://techclimber.blogspot.com.es/2009/03/php-curl-and-aspnet.html
This is my function:
function UolgetHtmlfromAjaxCallback($a_Params,$url) {
$EVENTTARGET = $this->UolgetAtributoEventTarget($a_Params['s_EventTarget']);
$s_smMaster = 'ctl00$cphSite$upModelo|'.$entries['target'];
$VIEWSTATE = urlencode($a_Params['s_ViewState']);
$options = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_USERAGENT => "spider", // who am i
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
CURLOPT_POST => true,
CURLOPT_POSTFIELDS => 'ct100%24smMaster='.urldecode($s_smMaster).
'&__EVENTTARGET='.urlencode($EVENTTARGET['target']).
'&__EVENTARGUMENT='.urlencode('').
'&__EVENTVALIDATION='.urlencode($s_Eventvalidation).
'&__VIEWSTATE='.$VIEWSTATE.
'&ct100%24txtBuscaNome='.$a_Params['ctl00_txtBuscaNome'].
'&__ASYNCPOST=true'
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$result = curl_exec ($ch);
return $result;
}
The $VIEWSTATE
is okay, and the $EVENTTARGET
is an array with these values:
Array (
[id] => ctl00_cphSite_fichaTecnicaEditorial_rptCarrosFichaTecnica_ctl00_lbtnFichaTecnica
[target] => ctl00$cphSite$fichaTecnicaEditorial$rptCarrosFichaTecnica$ctl00$lbtnFichaTecnica )
I'm trying to use this code to scrape this website:
http://comparecar.uol.com.br/Modelo/Volvo-Xc60
And I get the website, but not the car's information.
I edit my own information.
I'm working with Tamper Data and I found that the post uses different parameters:
ctl00%24smMaster
__EVENTTARGET
__EVENTARGUMENT
__VIEWSTATE
__EVENTVALIDATION
ctl00%24txtBuscaNome
__ASYNCPOST
And closes with an "="