0

I am using CURL and file_get_contents to find out the basic difference between a server request for a page and a browser request (organic).

I am requesting for a PHPINFO page both ways and found that it is giving different output in different cases.

For example, when I am using a browser the PHPINFO shows this: _SERVER["HTTP_CACHE_CONTROL"] no-cache This info is missing when I am requesting the same page through PHP.

My CURL:

    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, "http://www.example.com/phpinfo.php");
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:32.0) Gecko/20100101 Firefox/32.0");
    curl_setopt($ch, CURLOPT_INTERFACE, $testIP);
    $output = curl_exec($ch);
    curl_close($ch);

My file_get_contents:

$opts = array(
'socket' => array('bindto' => 'xxx.xx.xx.xx:0'),
'method'  => 'GET',
'user_agent '  => "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:32.0) Gecko/20100101 Firefox/32.0", // this doesn't work
'header' => array('Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*\/*;q=0.8')
);

My goal: To make a PHP request look identical to a browser request.

  • Why do you want your requests to look like browser requests? – Halcyon Jan 10 '15 at 13:40
  • @Halcyon This is to prevent servers from treating my requests differently from a regular browser request. – Kolkata Calcutta Jan 10 '15 at 13:43
  • Ok, pretend I ask you _'why?'_ 5 times, then decide if that's really what you want to do. – Halcyon Jan 10 '15 at 13:44
  • @Halcyon I need help to complete this project. I'll ask 'why' to my boss after the project is complete. – Kolkata Calcutta Jan 10 '15 at 13:46
  • The problem is that there isn't a good solution. And the reason there isn't a good solution is because you shouldn't be doing this. If you're writing a crawler people are going to want to know you're crawling their site. It's deceitful to say you are someone or something you are not. – Halcyon Jan 10 '15 at 13:49
  • See also http://stackoverflow.com/questions/2107759/php-file-get-contents-and-headers – Hagen von Eitzen Jan 10 '15 at 14:00
  • @Halcyon before you call the police I must assure you that this is an intranet project where our NY based server will interact with our Seattle based server. Many thanks for your help. – Kolkata Calcutta Jan 10 '15 at 14:07
  • @Halcyon a browser is unable to save many pages but a PHP request can. That's our primary purpose i.e. to save data which we can't do manually. Thank you once again for your help. – Kolkata Calcutta Jan 10 '15 at 14:09
  • @HagenvonEitzen that's a great reference. Quite helpful. Thanks for the support. – Kolkata Calcutta Jan 10 '15 at 14:10
  • I'm perplexed. Why would you do it this way? You can design a web service API and do away with all this nonsense. – Halcyon Jan 10 '15 at 14:10

1 Answers1

-1

one of possible ways for server to detect you are a php code not a browser is check your cookie. with php curl request to the server once and inject the cookie you get to your next request. check here : http://docstore.mik.ua/orelly/webprog/pcook/ch11_04.htm one other way that server can understand you are a robot(php code) is check referer http header. you can learn more here : http://en.wikipedia.org/wiki/HTTP_referer

Majid Abdolhosseini
  • 2,191
  • 4
  • 31
  • 59