I was trying to read a page from the same site using PHP. I came across this good discussion and decided to use the cURL method suggested:
function get_web_page( $url )
{
$options = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
$header['errno'] = $err;
$header['errmsg'] = $errmsg;
$header['content'] = $content;
return $header;
}
//Now get the webpage
$data = get_web_page( "https://www.google.com/" );
//Display the data (optional)
echo "<pre>" . $data['content'] . "</pre>";
So, for my case, I called the get_web_page
like this:
$target_url = "http://" . $_SERVER['SERVER_NAME'] . "/press-release/index.html";
$page = get_web_page($target_url);
The thing that I couldn't fathom is it worked on all of my test servers but one. I've verified that the cURL is available on the server in question. Also, setting `$target_url = "http://www.google.com" worked fine. So, I am pretty positive that the culprit has nothing to do with the cURL library.
Can it be because some servers block themselves from being "crawled" by this type of script? Or, maybe I just missed something here?
Thanks beforehand.
Similar questions: