1

I am trying to get HTML source from URL using curl.

The below code works perfectly in localhost but it does not return anything when moved to server:

function get_html_from_url($url) {
$options = array(
        CURLOPT_RETURNTRANSFER => true,
        CURLOPT_HEADER         => false,   
        CURLOPT_FOLLOWLOCATION => false,   
        CURLOPT_ENCODING       => "",      
        CURLOPT_USERAGENT      => "User-agent: Mozilla/5.0 (iPhone; U; CPU like Mac OS X; en) AppleWebKit/420.1 (KHTML, like Gecko) Version/3.0 Mobile/3B48b Safari/419.3", 
        CURLOPT_AUTOREFERER    => true,     
        CURLOPT_CONNECTTIMEOUT => 30,      
        CURLOPT_HTTPHEADER     => array(
            "Host: host.com",
            "Upgrade-Insecure-Requests: 1",
            "User-Agent: Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Mobile Safari/537.36",
            "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
            "Accept-Encoding: gzip, deflate",
            "Accept-Language: en-US,en;q=0.9",
            "Cookie: JSESSIONID=SESSSIONID",
            "Connection: close"
        ),
        CURLOPT_TIMEOUT        => 30,     
        CURLOPT_MAXREDIRS      => 10,     
        CURLOPT_SSL_VERIFYPEER => false,  
    );
    $ch      = curl_init( $url );
    curl_setopt_array( $ch, $options );
    $content = curl_exec( $ch );
    $err     = curl_errno( $ch );
    $errmsg  = curl_error( $ch );
    $header  = curl_getinfo( $ch );
    curl_close( $ch );

    $header['errno']   = $err;  
    $header['errmsg']  = $errmsg;
    $header['content'] = $content;
    return $header;
}

I get timeout error on server and I even tried to increase the timeout but no luck!

Thanks.

mmvsbg
  • 3,570
  • 17
  • 52
  • 73
Khan Sharukh
  • 1,151
  • 12
  • 21

3 Answers3

2

You could run a test using file_get_contents() like so:

$url = file_get_contents('http://example.com');
echo $url; 

But using Curl is the way to go. I'd check what network access you have from the server?

Erdss4
  • 1,025
  • 3
  • 11
  • 31
1

Here is a sample code which fetches remote url data & store inside a file. Hope it'll help you.

function scrapper()
{
    $url = "https://www.google.com/";

    $curl = curl_init();

    curl_setopt_array($curl, array(
        CURLOPT_RETURNTRANSFER => 1,
        CURLOPT_URL => $url
    ));

    $response = curl_exec($curl);

    return $response;
}

$scrap_data = scrapper();

$myfile = fopen("scrap_data.txt", "w") or die("Unable to open file!");
fwrite($myfile, $scrap_data);
fclose($myfile);

echo "Scrapped data saved inside file";
Suresh
  • 5,687
  • 12
  • 51
  • 80
  • Hi @Suresh, my script is working fine in localhost but not in server, this is the issue I have, Thanks – Khan Sharukh Sep 24 '18 at 13:08
  • I get connection timed out, and when I increase timeout, the issue still persists – Khan Sharukh Sep 24 '18 at 13:14
  • Thanks @suresh code is working fine in the server but not in local – Suraj Rathod Jul 30 '20 at 06:54
  • 1
    @SurajRathod use following bit code in top of your PHP script & execute. This will helpe you to show what exact error you are getting at the time of execution. ```error_reporting(E_ALL);set_time_limit(0);ini_set('display_errors', '1');ini_set('memory_limit', '-1');``` – Suresh Jul 31 '20 at 07:48
1

If I correctly understood your requirement, the following script should get you there. There is a function you can make use of htmlspecialchars() to get the desired output.

<?php
function get_content($url) {
    $options = array(
            CURLOPT_RETURNTRANSFER => 1, 
            CURLOPT_USERAGENT      => "Mozilla/5.0",         
    );
    $ch      = curl_init( $url );
    curl_setopt_array( $ch, $options );
    $htmlContent = curl_exec( $ch );
    curl_close( $ch );
    return $htmlContent;
}
$link = "https://stackoverflow.com/questions/52477020/get-html-from-a-url-using-curl-in-php"; 
$response = get_content($link);
echo htmlspecialchars($response);
?>

The link I've used within the script is just a placeholder. Feel free to replace that with the one you are after.

SIM
  • 21,997
  • 5
  • 37
  • 109
  • Hi @SIM, I am able to successfully fetch html in localhost but not in server, is it because of IP address and if it does So how can we resolve this! – Khan Sharukh Sep 24 '18 at 13:09