13

I have been using file_get_contents to grab the contents of a site for years.

Recently, they updated their URL to HTTPS and file_get_contents stopped working.

I've read previous questions and tried marked solutions, but nothing has worked.

For example, I tried this, and it returned the following:

openssl: yes http wrapper: yes https wrapper: yes wrappers: array ( 0 => 'https', 1 => 'ftps', 2 => 'compress.zlib', 3 => 'compress.bzip2', 4 => 'php', 5 => 'file', 6 => 'data', 7 => 'http', 8 => 'ftp', 9 => 'zip', )

So then I tried this solution with file_get_contents, to no avail.

I then tried this solution with cURL to ignore encryption altogether, to no avail

No matter which solution I try, nothing is returned.

I have not added extension=php_openssl.dll and allow_url_include = On to PHP.ini as per this as this particular site is on a shared host and the hosting company does not allow the PHP.ini filed to be edited, although they may already be enabled by default.

I tried other HTTPS sites, and some work and some do not, and I'm not sure why.

I tried from a different Server (and different IP) on the same web host, and it also did not work with the target HTTPS site.

How can I debug and fix this?

UPDATE:

phpinfo shows:

curl cURL support enabled cURL Information libcurl/7.36.0 OpenSSL/0.9.8b zlib/1.2.3 libidn/0.6.5 libssh2/1.8.0

openssl OpenSSL support enabled OpenSSL Version OpenSSL 0.9.8e-fips-rhel5 01 Jul 2008

Community
  • 1
  • 1
ProgrammerGirl
  • 3,157
  • 7
  • 45
  • 82
  • Can you please add the specific URL you try to fetch so that we can verify your issue? Nothing we can do without specific information. – arkascha Feb 11 '17 at 11:01
  • maybe they block requests with no user-agent string, its unfortunately becoming common practice. and @arkascha is correct, you're not providing enough information. very specifically, what HTTP headers does the browser send when you get a valid response? and by comparison, what HTTP headers does curl send when it gets an empty response? use your browser's developer tools to find out what the browser sends. use CURLOPT_VERBOSE to find out what curl sends – hanshenrik Feb 11 '17 at 11:42
  • Here is the URL (please do not post it): http://i.imgur.com/85wsoLI.jpg – ProgrammerGirl Feb 11 '17 at 11:50
  • @ProgrammerGirl what is the output of ```1, CURLOPT_STDERR=>$fp, CURLOPT_FILE=>$fp ))){ throw new \RuntimeException('curl_setopt_array failed. '.curl_error($ch)); } var_dump(curl_exec($ch)); curl_close($ch); rewind($fp); var_dump(stream_get_contents($fp)); ``` (just fix the url ofc) – hanshenrik Feb 11 '17 at 12:28
  • Here are the results: `bool(false) string(316) "* Hostname was NOT found in DNS cache * Trying 69.[IP REMOVED]... * Connected to www.[Domain Removed].com (69.[IP REMOVED]) port 443 (#0) * successfully set certificate verify locations: * CAfile: /etc/pki/tls/certs/ca-bundle.crt CApath: none * Unknown SSL protocol error in connection to www.[Domain Removed].com:443 * Closing connection 0 "` – ProgrammerGirl Feb 11 '17 at 13:01
  • @ProgrammerGirl then it's an ssl error indeed. shame on you for not checking the return value of curl_exec. (assuming CURLOPT_RETURNTRANSFER) when it returns an empty string, it indeed returns nothing, but with this, it will return bool(false) - that's not `nothing` , that's an error.. anyway, i've seen this many times before, the solution is to upgrade the version of OpenSSL/GnuTLS on the client – hanshenrik Feb 12 '17 at 00:18
  • Do you mean my shared web host has to upgrade something? Also, if it's a matter of an incorrect version of OpenSSL/GnuTLS, then why isn't bypassing SSL (`curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);` working either? – ProgrammerGirl Feb 12 '17 at 12:11
  • yes they have to upgrade something. and it's because they're still using the SSL protocol, it's just that the ssl certificate is not being verified – hanshenrik Feb 12 '17 at 20:55
  • OK, but then why isn't it working when SSL is bypassed using `curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);`? – ProgrammerGirl Feb 13 '17 at 14:46
  • Have you try with another url ? if not please try with another url if you faced same issue then please check firewall if active then please turn off that and check. – Sourabh Feb 20 '17 at 13:21

5 Answers5

5

FINAL ANSWER

If your ISP will not upgrade openSSL to TLS 1.2 you should seriously consider another ISP. You should test your server with the "SSL SERVER TEST" link below. Your server likely has SSL security vulnerabilities.

The server you are trying to connect with only supports TLS 1.2 and TLS 1.1
Does not support :TLS 1.0, SSL 3, SSL2.

When an SSL request is made, as part of the SSL protocol, curl presents a list of ciphers to the host server. The server then picks which cypher protocol to use based on the list presented by curl.

The host you are trying to cont to supports these cypher suites

TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (0xc030)  
TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (0xc02f)  
TLS_DHE_RSA_WITH_AES_256_GCM_SHA384 (0x9f) 
TLS_DHE_RSA_WITH_AES_128_GCM_SHA256 (0x9e)  
TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384 (0xc028)  
TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA (0xc014)  
TLS_DHE_RSA_WITH_AES_256_CBC_SHA256 (0x6b)  
TLS_DHE_RSA_WITH_AES_256_CBC_SHA (0x39) 
TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256 (0xc027) 
TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA (0xc013)  
TLS_DHE_RSA_WITH_AES_128_CBC_SHA256 (0x67)  
TLS_DHE_RSA_WITH_AES_128_CBC_SHA (0x33) 
TLS_RSA_WITH_AES_256_GCM_SHA384 (0x9d) 
TLS_RSA_WITH_AES_128_GCM_SHA256 (0x9c) 
TLS_RSA_WITH_AES_256_CBC_SHA256 (0x3d) 
TLS_RSA_WITH_AES_256_CBC_SHA (0x35) 
TLS_RSA_WITH_AES_128_CBC_SHA256 (0x3c) 
TLS_RSA_WITH_AES_128_CBC_SHA (0x2f) 

Because your openSSL was released in July 2008 and TLSv1.2 was released the following month, August 2008, the best you have is TLSv1.1

POSSIBLE TEMPORARY FIX until you upgrade

I do not have a high level of confidence this will work for you

You should test your own server's SSL with something like this SSL SERVER TEST

If your server supports TLS1.1 then you can try the following. I cannot test this because I do not have the same version of curl as you on the old server with your version of openSSL.

Use the curl option, CURLOPT_SSL_CIPHER_LIST to restrain the host server from using anything other than TLS 1.1

curl_setopt($ch, CURLOPT_SSL_CIPHER_LIST, 'TLSv1');
curl_setopt($ch, CURL_SSLVERSION_TLSv1_1);

If not then try:

curl_setopt($ch, CURLOPT_SSL_CIPHER_LIST, 'DEFAULT');
curl_setopt($ch, CURL_SSLVERSION_TLSv1_1);

BOTTOM LINE

For more reasons than this issue, you need to upgrade your openSSL.

-------------------------------------------------------------------------

 
 -

PREVIOUS TROUBLESHOOTING BELOW THIS POINT

The first thing I do is turn off javascript in the Browser. If I can retrieve the page with a browser without javascript, I KNOW I can get it with PHP.

I build the request to look exactly like it does in the Browser. I go to the Network tab of the Inspector and Edit the Request Header and copy it an paste it into my code.

enter image description here

enter image description here

$request = array();
$request[] = 'Host: example.com';
$request[] = 'Connection: keep-alive';
$request[] = 'Pragma: no-cache';
$request[] = 'Cache-Control: no-cache';
$request[] = 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8';
$request[] = 'User-Agent: Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.101 Safari/537.36';
$request[] = 'DNT: 1';
$request[] = 'Origin: https://example.com';
$request[] = 'Referer: https://example.com/entry/login';
$request[] = 'Accept-Encoding: gzip, deflate';
$request[] = 'Accept-Language: en-US,en;q=0.8';

Initalize curl

$url = 'https://example.com/entry/login';
$ch = curl_init($url);

Add the request parameters

curl_setopt($ch, CURLOPT_HTTPHEADER, $request);

Tell curl to include the headers

curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLINFO_HEADER_OUT, true);
curl_setopt($ch, CURLOPT_HEADER, true);

Return the response

curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

Follow redirects Redirects may be a trap. You may have to NOT follow and analyze the response. Often the redirects are there to set cookies.

curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_COOKIESESSION , true );

Let curl handle compression

curl_setopt($ch, CURLOPT_ENCODING,"");

Set timeout parameters

curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($ch, CURLOPT_TIMEOUT,10);
curl_setopt($ch, CURLOPT_FAILONERROR,true);

Make the Request and get Response

The following will get everything you need to know about the requests. The $info will also have all the redirect headers too. If redirects were made the $responseHeader will have all the response headers.

UPDATE: New Fully Tested Code

This may not matter because this also works on my machine:

echo file_get_contents($url);

If curl fails, this code should give you a reason WHY it failed.

Change the url. This one belongs to a client.

<?php
header('content-type: text/plain');

$url = 'https://amxemr.com';
$ch = curl_init($url);

curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);

curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_ENCODING,"");
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($ch, CURLOPT_TIMEOUT,10);
curl_setopt($ch, CURLOPT_FAILONERROR,true);
curl_setopt($ch, CURLOPT_ENCODING,"");
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLINFO_HEADER_OUT, true);
curl_setopt($ch, CURLOPT_HEADER, true);


$data = curl_exec($ch);
if (curl_errno($ch)){
    echo 'Retreive Base Page Error: ' . curl_error($ch);
}
else {
  $info = rawurldecode(var_export(curl_getinfo($ch),true));

 // Get the cookies:

  $skip = intval(curl_getinfo($ch, CURLINFO_HEADER_SIZE)); 
  $responseHeader= substr($data,0,$skip);
  $data= substr($data,$skip);
  echo "HEADER: $responseHeader\n";
  echo "\n\nINFO: $info\n\nDATA: $data";
}  
?>

If the above did not work run phpinfo()

<?php
phpinfo();
?>  

There should be a curl section and openSSL.

enter image description here

phpinfo openSSL

--------------------------------------------------------------------

UPDATE TWO

Good News

I know the problem and I was able to replicate the errors you got.

Retreive Base Page Error: 
Unknown SSL protocol error in connection to www.xxxx.com:443 

NOTE xxx was the site from the link you gave me, you can delete that message now.

Funny thing, I have one server I do not update. And by luck, it had the same version of openSSL from July 2008.

You need to upgrade your openSSL. Also the file_get_contents() failed on this server too. It worked on a Feb. 2013 version of openSSL as well as the June 2014.

I cannot say whether or not anything else needs to be upgraded like the functions that use openSSL may (or may not) need to be upgraded.

I go with the adage if it ain't broke don't fix it. I do believe some upgrades are actually down grades. I'm still on XP. But it's broke and you need to fix it.

At least it's not a shot in the dark fix. I am confident you have to upgrade. It was a methodical troubleshooting procedure that was able to duplicate your errors. You can go back to using file_get_contents() too.

Community
  • 1
  • 1
Misunderstood
  • 5,534
  • 1
  • 18
  • 25
  • Thanks, but I enabled Error Reporting in PHP and copied & pasted your entire code above with my target URL, but it returned a blank page. No errors were reported. Keep in mind that I'm connecting to an `HTTPS` site. I then tried the URL of this SO page (which is not HTTPS), and it also returned a blank page with no errors. So there seems to be a problem somewhere in your code. How can I fix this? – ProgrammerGirl Feb 12 '17 at 12:09
  • Sorry, Change `$data .= 'Retreive Base Page Error: ' . curl_error($ch); ` TO: `echo 'Retreive Base Page Error: ' . curl_error($ch);` – Misunderstood Feb 16 '17 at 00:57
  • And add some text for the variables, `echo "HEADER: $responseHeader\nINFO: \n"; var_export($info); echo "\n\nDATA: $data";` – Misunderstood Feb 16 '17 at 00:59
  • @Misunderstood: I tried your new code on the target URL and got the following error: `Retreive Base Page Error: Unknown SSL protocol error in connection to www.[Domain Removed].com:443`. Here is the target URL (please do not post it): i.imgur.com/85wsoLI.jpg . How can I fix this so it works on the target URL? – ProgrammerGirl Feb 17 '17 at 02:56
  • @Misunderstood: Also, I edited my question to add the `phpinfo()` data. – ProgrammerGirl Feb 17 '17 at 02:59
  • And that was what lead to the answer. See my UPDATE TWO. – Misunderstood Feb 17 '17 at 06:16
  • @AnthonyRutledge sorry, I have been spreading myself too thin. Besides I have not done anything with routers since the early 1980s. Back in the days when an Ethernet adapter was 3' x 2' x 1', and that is not a typo, that is feet, not inches. I worked for the Worlds Largest Ethernet manufacturer, Ungermann-Bass. Ralph Ungermann was the Intel engineer behind the Intel 4040/8080 processor and when on to found Zilog and developed the Z80. – Misunderstood Feb 17 '17 at 06:40
  • The problem is that this is on a shared web host, so getting them to upgrade won't be easy and I will be at their mercy. Isn't there any possible way to bypass SSL completely and ignore any certificates and just pull the data? – ProgrammerGirl Feb 17 '17 at 15:15
  • 1
    HTTPS data when transmitted is encrypted. You must be able to decrypt it and at the same time nobody but you can decrypt to eliminate eavesdropping. A method to bypass this would open doors to all kinds of security risks. It is unlikely such a scheme would ever be implemented. Your problem has nothing to do with certificates. You have an outdated library of SSL decryption routines. The site sends you the data utilizing a encryption protocol that is not supported by you openSSL version. – Misunderstood Feb 17 '17 at 17:24
  • I have run out of time for now. Look into CURLOPT_SSL_CIPHER_LIST. When an SSL request is made curl presents a list of ciphers. The server then picks which cypher protocol to use based on the list presented. – Misunderstood Feb 17 '17 at 18:31
  • I tried CURLOPT_SSL_CIPHER_LIST today but could not test because I have a different version of curl than you. I added a final answer to my posted answer. You need to upgrade is the answer. There may be a temporary fix but it is doubtful. Don't forget to check my answer. – Misunderstood Feb 18 '17 at 19:21
  • I could not get it to work, so I have asked the web host to either upgrade the SSL version of this shared server or to migrate the site to a different server with a newer version of SSL. – ProgrammerGirl Feb 21 '17 at 15:02
0

if by nothing, you mean an empty response body, it doesn't sound like an httpS issue. if it was, then curl_exec would complain, curl_exec() would return bool(false) , and curl_error() would indicate an SSL problem.

How can I debug and fix this?

investigate the request sent by your browser when you get a valid response (use your browser's developer tools for this. for example, the "Network" tab in Google Chrome's Ctrl+shift+i ), then compare it with the request sent by curl when you get an invalid response (use CURLOPT_VERBOSE for this), and 1 by 1, add all the headers the browser send,

for example, you'll notice that libcurl sends no user-agent header, while your browser sends something like user-agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36 , so add that header. you'll also notice that libcurl by default sends Accept: */* , while your browser sends Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8 - so fix that, make curl send the same headers.

keep doing this, until the 2 requests are indistinguishable, and along the way, you'll find the difference that makes curl blocked.

my bet is on the user-agent header.

hanshenrik
  • 19,904
  • 4
  • 43
  • 89
  • Based on your answer, I tried this `cURL` solution (http://stackoverflow.com/a/9571305/869849) which supplies HTTP Headers very similar to a browser, with `CURLOPT_VERBOSE` enabled, but it returned no results, and no verbose information on the page. Any ideas? – ProgrammerGirl Feb 11 '17 at 12:24
  • @ProgrammerGirl i suspect that you just didn't check php's error logs. CURLOPT_VERBOSE's output, by default, goes to php's error log/stderr, not stdout/browser. redirect it with CURLOPT_STDERR - but did you check php's error logs? – hanshenrik Feb 11 '17 at 12:35
  • The solution I linked to in my comment above echo's the `CURLOPT_VERBOSE` results to the page. I just checked the error log as per your request and there is nothing related to `cURL` there. What else can I try? – ProgrammerGirl Feb 11 '17 at 12:58
-1

Sometimes it helps to not validate the certificate and host, but simply trust the cryptographic in SSL.

$context = stream_context_create(
    array('http' => array(
            'follow_location' => true
        ),
        'ssl' => array(
            'verify_peer' => false, 
            'verify_peer_name' => false
        )
    )
);

$content = @file_get_contents($file, FALSE, $context);
powtac
  • 40,542
  • 28
  • 115
  • 170
  • Unfortunately, that did not work. I set `$file` equal to the URL and then added `echo $content;` at the end but it didn't display anything. Any other ideas? – ProgrammerGirl Feb 16 '17 at 01:44
-1

Does the HTTPS site have a self-signed certificate? Can you provide the domain names for some of the sites that works and some that doesn't?

Have you tried using "allow_self_signed" => true in the stream context configuration?

So it gets like:

$arrContextOptions=array(
    "ssl"=>array(
        "verify_peer"=>false,
        "verify_peer_name"=>false,
        "allow_self_signed"=>true,
    ),
);  

$response = file_get_contents($url, false, stream_context_create($arrContextOptions));
peiiion
  • 308
  • 2
  • 4
  • I tried this with the target URL, but it did not work. Here is the target URL (please do not post it): i.imgur.com/85wsoLI.jpg . Your code also didn't work with Twitter, even though EaBangalore's answer does work with Twitter (but not with the target URL in question). Any other ideas? – ProgrammerGirl Feb 17 '17 at 03:06
  • The domain and certificate look good. The HTML is loading insecure content over HTTP when rendering though, but it shouldn't be a problem for you. I think the problem is that the client (your PHP and the environment) you're using is outdated and may not support newer SSL protocols. What PHP version are you using and on which OS/Version? – peiiion Feb 18 '17 at 13:35
-1

As it looks like a problem with SSL version you could set CURL to ignore it using CURLOPT_SSL_VERIFYPEER.

Here is a script working with the url you posted

$url = 'https://XXX/YYY/view-all';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
$response = curl_exec($ch);
$info = curl_getinfo($ch);
curl_close($ch);
print_r($response);
Iñaki Soria
  • 869
  • 7
  • 15