4

Here's the URL: https://www.grammarly.com

I'm trying to fetch HTTP headers by using the native get_headers() function:

$headers = get_headers('https://www.grammarly.com')

The result is

HTTP/1.1 400 Bad Request
Date: Fri, 27 Apr 2018 12:32:34 GMT
Content-Type: text/plain; charset=UTF-8
Content-Length: 52
Connection: close

But, if I do the same with the curl command line tool, the result will be different:

curl -sI https://www.grammarly.com/

HTTP/1.1 200 OK
Date: Fri, 27 Apr 2018 12:54:47 GMT
Content-Type: text/html; charset=UTF-8
Content-Length: 25130
Connection: keep-alive

What is the reason for this difference in responses? Is it some kind of poorly implemented security feature on Grammarly's server-side or something else?

Limon Monte
  • 52,539
  • 45
  • 182
  • 213
  • It looks like "poorly implemented security feature" because setting an user agent in `get_headers` makes `HTTP/1.1 302 Found`. Try to get Your curl request headers - there is a chance that there is set some default user agent and what You are getting is final response (after all redirects). – bigwolk Apr 27 '18 at 13:38
  • It's probably returning the 400 response because of the User Agent header being sent on the `get_headers()` request. – Anthony Apr 27 '18 at 14:20

2 Answers2

4

It is because get_headers() uses the default stream context, which basically means that almost no HTTP headers are sent to the URL, which most remote servers will be fussy about. Usually the missing header most likely to cause issues is the User-Agent. You can set it manually before calling get_headers() using stream_context_set_default. Here's an example that works for me:

$headers = get_headers('https://www.grammarly.com');

print_r($headers);

// has [0] => HTTP/1.1 400 Bad Request

stream_context_set_default(
    array(
        'http' => array(
            'user_agent'=>"php/testing"
        ),
    )
);

$headers = get_headers('https://www.grammarly.com');

print_r($headers);

// has [0] => HTTP/1.1 200 OK
Anthony
  • 36,459
  • 25
  • 97
  • 163
  • Just discovered from http://php.net/manual/en/context.http.php that the http context has a pre-defined "user_agent" option to set the User-Agent header. Updated answer to reflect this. – Anthony Apr 27 '18 at 16:02
  • correction, the vast majority of servers won't care, but a few servers do. – hanshenrik Apr 28 '18 at 07:06
0

Just use php curl function for it:

function getMyHeaders($url)
{
    $options = array(
        CURLOPT_RETURNTRANSFER => true,    
        CURLOPT_HEADER         => true,    
        CURLOPT_FOLLOWLOCATION => true,    
        CURLOPT_USERAGENT      => "spider",
        CURLOPT_AUTOREFERER    => true,
        CURLOPT_SSL_VERIFYPEER => false,
        CURLOPT_NOBODY => true
    );
    $ch = curl_init($url);
    curl_setopt_array($ch, $options);
    $content = curl_exec($ch);
    curl_close($ch);
    return $content;
}
print_r(getMyHeaders('https://www.grammarly.com'));
Evgeny Ruban
  • 1,357
  • 1
  • 14
  • 20
  • 2
    Something to consider : `CURLOPT_NOBODY` changes the request method from `GET` to `HEAD`, which a good server/web-app should handle the same way, but only send the headers, but many web-apps don't properly support, and thus will return different headers for a `HEAD` request than a `GET` request. – Anthony Apr 27 '18 at 15:08
  • @Anthony thanks for your comment. You right, yes. I've also tried to change default stream context like in your answer, but used `'ssl'` option and it doesn't work :) – Evgeny Ruban Apr 27 '18 at 15:17
  • 1
    I think that the `ssl` context has its own set of options, but it doesn't "inherit" the options of `http` context. So if you wanted to set the user agent for all http requests and set verify-peer to false for all https requests similar to your curl example, it would look something like : `stream_context_set_default( array( 'http' => array( 'user_agent'=>"spider" ), 'ssl' => array( 'verify_peer'=> false ), ) )` based on http://php.net/manual/en/context.ssl.php – Anthony Apr 27 '18 at 16:05