1

I have been trying to get a web content from a link using PHP. I have tried using file_get_contents() and curl but both are not working with the link I want. My curl code is as follows :

function request($url){
   $curl = curl_init();
   curl_setopt($curl, CURLOPT_URL, $url);
   curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
   $res = curl_exec($curl);
   curl_close($curl);
   return $res;
}
echo request("http://...");

This code prints something like ""

This code works with sites like Google but doesn't work with the url I want. However, when I try it with the curl terminal command, it works from there. What could be the problem ?

Here is the curl -I output:

HTTP/1.1 200 OK
Date: Mon, 09 Jun 2014 23:47:43 GMT
Server: Apache
Set-Cookie: PHPSESSID=m7fs1ikt47epgoiekg68nnq064; path=/; domain=.sozlukspot.com
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
X-Powered-By: PleskLin
Connection: close
Content-Type: text/html
Giacomo1968
  • 25,759
  • 11
  • 71
  • 103
OguzGelal
  • 757
  • 7
  • 20
  • 1
    What is the output of `curl -I` to the problem URL? It could be GZipped content if you see `Content-Encoding: gzip` in the header. – Giacomo1968 Jun 09 '14 at 23:45
  • @meda I checked that one out, didn't help – OguzGelal Jun 09 '14 at 23:46
  • @JakeGould added the output to the question. Doesn't seem to be gzipped – OguzGelal Jun 09 '14 at 23:49
  • @OguzGelal Posted an answer. Think my solution works. You need to set the user agent. As for the `` my guess is the server is just badly configured & that `` could be part of a garbled error message page. I mean look at the expires header: `Expires: Thu, 19 Nov 1981 08:52:00 GMT` What is that? Expires 1981? Anyway, check out my answer & see if that works for you. – Giacomo1968 Jun 10 '14 at 00:52

2 Answers2

1

Try this. I reworked your function with a standard set of curl options I set for getting content from remote URLs. I believe it should work.

I believe the issue is the site you are trying to connect to refuses connections from web requests that don’t have a proper user agent set, are pure curl requests or just have a blank user agent. So setting the CURLOPT_USERAGENT in curl should work. I have it set to a fairly generic Mozilla/5.0 setting here, but change that to whatever other agent you feel you would need to set that to.

function request($url){
    // The actual curl request.
    $curl_timeout = 5;
    $curl = curl_init();
    curl_setopt($curl, CURLOPT_URL, $url);
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, false);
    curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
    curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, $curl_timeout);
    curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
    curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
    $res = curl_exec($curl);
    curl_close($curl);
    return $res;
}
echo request("http://...");
Giacomo1968
  • 25,759
  • 11
  • 71
  • 103
0

The site you are accessing is returning the BOM marker. This usually indicates that the character encoding is in UNICODE format. Try changing the last line in your function to:

return utf8_decode($res);
Len_D
  • 1,422
  • 1
  • 12
  • 21
  • Not a bad idea. But the vast majorty of websites out there—including Google—return UTF-8 content. In this case it could be bad system configuration for non-web browser requests coming from tools like `curl`. – Giacomo1968 Jun 10 '14 at 00:50