0

I got an order from my client to scrape a website using php curl. I did the job and the script was working fine on my localhost. But when I gave it to my client script was not working on his localhost.

<?php

ini_set('display_errors', 'On');
error_reporting(E_ALL);

print "Cascading https://www.autotrader.ca/cars/on/toronto/?rcp=15&rcs=0&prx=100&prv=Ontario&loc=toronto%2C%20on&hprc=True&wcp=True&sts=New-Used&inMarket=basicSearch&mdl=Accent&make=Hyundai&scrladid=11543266:<p>";

$array = [];
$array[] = "/a/hyundai/accent/oshawa/ontario/19_11543266_/?showcpo=ShowCpo&amp;ncse=no&amp;orup=1_15_340&amp;sprx=100";
$array[] = "/a/hyundai/accent/cambridge/ontario/5_48590586_20200220145456261/?showcpo=ShowCpo&amp;ncse=no&amp;orup=2_15_340&amp;sprx=100";
$array[] = "/a/hyundai/accent/mississauga/ontario/19_11536424_/?showcpo=ShowCpo&amp;ncse=no&amp;orup=3_15_340&amp;sprx=100";

foreach ($array as $key=>$value)
{
    $scrape = "https://www.autotrader.ca".$array[$key];
    print "Scraping $scrape<p>";
    echo "<br>";

    $user_agent = 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Mobile Safari/537.36';

    $headers = [
        'accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
        'accept-encoding: gzip, deflate, br',
        'accept-language: en-US,en;q=0.9',
        'cache-control: max-age=0',
        'sec-fetch-dest: document',
        'sec-fetch-mode: navigate',
        'sec-fetch-site: none',
        'sec-fetch-user: ?1',
        'upgrade-insecure-requests: 1',
        'user-agent: Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Mobile Safari/537.36',
    ];
    
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $scrape);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 15);
    curl_setopt($ch, CURLOPT_TIMEOUT, 100);
    curl_setopt($ch, CURLOPT_ENCODING, 1);
    curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
    curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
    curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
    curl_setopt($ch, CURLOPT_VERBOSE, true);
    // curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-Length: 0'));
    curl_setopt($ch, CURLOPT_COOKIEJAR, dirname(__FILE__) . '/cookie.txt');
    curl_setopt($ch, CURLOPT_COOKIEFILE, dirname(__FILE__) . '/cookie.txt');

    $contents = curl_exec($ch);
    
    if ($contents === FALSE){
        echo "Error : ".curl_error($ch);
        echo "<br>";
        print "contents returned for $key = FALSE<br>";
    }

    curl_close($ch);
    
    // echo $contents;

    $start_pos = strpos($contents, "<title>", 0);
    $end_pos = strpos($contents, "</title>", 0);
    $title = substr($contents, $start_pos+7, $end_pos-$start_pos);
    
    print "Listing $key: $title<p>";
    echo "<br>";
    echo "<br>";
}

He also told that he was scraping website before not using curl but with any other method and he thinks that they have restricted his requests to their server but please note that he can still visit the website in the browser. I checked that he was able to get correct response if he replace the url with google url in curl.

2 Answers2

0

The most likely issue here is that your client's installation of PHP does not have the php-curl extension installed or enabled. This is achieved differently depending on your OS and how PHP was installed but here are a few common situations:

For Ubuntu or other Debian based Linux distributions:

apt-get install php7.4-curl
systemctl restart apache2

Replace '7.4' with the version of PHP that you are currently using in the first command

For WAMP on Windows: How to enable curl in Wamp server

For XAMPP on Windows: How to enable cURL in PHP / XAMPP

fignet
  • 305
  • 2
  • 7
0

Running it behind a proxy, working fine. Simplified and corrected some little mistakes.

Try this and do not forget to comment/edit the CURLOPT_PROXY line.

<?php
ini_set('display_errors', 'On');
error_reporting(E_ALL);

$array = [
    "/a/hyundai/accent/oshawa/ontario/19_11543266_/?showcpo=ShowCpo&amp;ncse=no&amp;orup=1_15_340&amp;sprx=100",
    "/a/hyundai/accent/cambridge/ontario/5_48590586_20200220145456261/?showcpo=ShowCpo&amp;ncse=no&amp;orup=2_15_340&amp;sprx=100",
    "/a/hyundai/accent/mississauga/ontario/19_11536424_/?showcpo=ShowCpo&amp;ncse=no&amp;orup=3_15_340&amp;sprx=100"
];

foreach ($array as $key => $value) {
    $scrape = "https://www.autotrader.ca" . $value;
    echo "Scraping " . $scrape . "<br>\n";
    
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $scrape);
    curl_setopt($ch, CURLOPT_PROXY, "http://<proxy_url>:80"); // Comment if not behind a proxy
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_TIMEOUT, 10);
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
    curl_setopt($ch, CURLOPT_COOKIEJAR, dirname(__FILE__) . '/cookie.txt');
    curl_setopt($ch, CURLOPT_COOKIEFILE, dirname(__FILE__) . '/cookie.txt');
    $contents = curl_exec($ch);

    if (curl_error($ch)) {
        echo "Error : " . curl_error($ch) . "<br>\n";
        break;
    }
    curl_close($ch);

    $title = explode("<title>", $contents);
    $title = explode("</title>", $title[1]);
    $title = $title[0];

    echo "Listing " . $key . ": " . $title . "<br>\n";
    echo "<br>\n";
    echo "<br>\n";
}