I got an order from my client to scrape a website using php curl. I did the job and the script was working fine on my localhost. But when I gave it to my client script was not working on his localhost.
<?php
ini_set('display_errors', 'On');
error_reporting(E_ALL);
print "Cascading https://www.autotrader.ca/cars/on/toronto/?rcp=15&rcs=0&prx=100&prv=Ontario&loc=toronto%2C%20on&hprc=True&wcp=True&sts=New-Used&inMarket=basicSearch&mdl=Accent&make=Hyundai&scrladid=11543266:<p>";
$array = [];
$array[] = "/a/hyundai/accent/oshawa/ontario/19_11543266_/?showcpo=ShowCpo&ncse=no&orup=1_15_340&sprx=100";
$array[] = "/a/hyundai/accent/cambridge/ontario/5_48590586_20200220145456261/?showcpo=ShowCpo&ncse=no&orup=2_15_340&sprx=100";
$array[] = "/a/hyundai/accent/mississauga/ontario/19_11536424_/?showcpo=ShowCpo&ncse=no&orup=3_15_340&sprx=100";
foreach ($array as $key=>$value)
{
$scrape = "https://www.autotrader.ca".$array[$key];
print "Scraping $scrape<p>";
echo "<br>";
$user_agent = 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Mobile Safari/537.36';
$headers = [
'accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'accept-encoding: gzip, deflate, br',
'accept-language: en-US,en;q=0.9',
'cache-control: max-age=0',
'sec-fetch-dest: document',
'sec-fetch-mode: navigate',
'sec-fetch-site: none',
'sec-fetch-user: ?1',
'upgrade-insecure-requests: 1',
'user-agent: Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Mobile Safari/537.36',
];
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $scrape);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 15);
curl_setopt($ch, CURLOPT_TIMEOUT, 100);
curl_setopt($ch, CURLOPT_ENCODING, 1);
curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_VERBOSE, true);
// curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-Length: 0'));
curl_setopt($ch, CURLOPT_COOKIEJAR, dirname(__FILE__) . '/cookie.txt');
curl_setopt($ch, CURLOPT_COOKIEFILE, dirname(__FILE__) . '/cookie.txt');
$contents = curl_exec($ch);
if ($contents === FALSE){
echo "Error : ".curl_error($ch);
echo "<br>";
print "contents returned for $key = FALSE<br>";
}
curl_close($ch);
// echo $contents;
$start_pos = strpos($contents, "<title>", 0);
$end_pos = strpos($contents, "</title>", 0);
$title = substr($contents, $start_pos+7, $end_pos-$start_pos);
print "Listing $key: $title<p>";
echo "<br>";
echo "<br>";
}
He also told that he was scraping website before not using curl but with any other method and he thinks that they have restricted his requests to their server but please note that he can still visit the website in the browser. I checked that he was able to get correct response if he replace the url with google url in curl.