4

I want to make a web crawler to make a statistic about most popular server software among Bulgarian sites, such as Apache, nginx, etc. Here is what I came up with:

import requests
r = requests.get('http://start.bg')
print(r.headers)

Which return the following:

{'Debug': 'unk', 
'Content-Type': 'text/html; charset=utf-8', 
'X-Powered-By': 'PHP/5.3.3', 
'Content-Length': '29761', 
'Connection': 'close', 
'Set-Cookie': 'fbnr=1; expires=Sat, 13-Feb-2016 22:00:01 GMT; path=/; domain=.start.bg', 
'Date': 'Sat, 13 Feb 2016 13:43:50 GMT', 
'Vary': 'Accept-Encoding', 
'Server': 'Apache/2.2.15 (CentOS)', 
'Content-Encoding': 'gzip'}

Here you can easily see that it runs on Apache/2.2.15 and you can get this result by simply saying r.headers['Server']. I tried that with several Bulgarian websites and they all had the Server key.

However, when I request the header of a more sophisticated website, such as www.teslamotors.com, I get the following info:

{'Content-Type': 'text/html; charset=utf-8', 
'X-Cache-Hits': '9', 
'Cache-Control': 'max-age=0, no-cache, no-store', 
'X-Content-Type-Options': 'nosniff', 
'Connection': 'keep-alive', 
'X-Varnish-Server': 'sjc04p1wwwvr11.sjc05.teslamotors.com', 
'Content-Language': 'en', 
'Pragma': 'no-cache', 
'Last-Modified': 'Sat, 13 Feb 2016 13:07:50 GMT', 
'X-Server': 'web03a', 
'Expires': 'Sat, 13 Feb 2016 13:37:55 GMT', 
'Content-Length': '10290', 
'Date': 'Sat, 13 Feb 2016 13:37:55 GMT', 
'Vary': 'Accept-Encoding', 
'ETag': '"1455368870-1"', 
'X-Frame-Options': 'SAMEORIGIN', 
'Accept-Ranges': 'bytes', 
'Content-Encoding': 'gzip'}

As you can see there isn't any ['Server'] key in this dictionary (although there is X-Server and X-Varnish-Server which I'm not sure what they mean, but its value is not a server name like Apache.

So i'm thinking there must be another request I could send that would yield the desired server information, or probably they have their own specific server software (which sounds plausible for facebook). I also tried other .com websites, such as https://spotify.com and it does have a ['Server'] key.

So is there a way to find the info about the servers Facebook and Tesla Motors use?

Boyan Kushlev
  • 1,043
  • 1
  • 18
  • 35

1 Answers1

3

That has nothing to do with python, most well configured web servers will not return information inside the "server" http header due to security implications.

No sane developer would want to let you know that they are running an unpatched version of xxx product.

sorin
  • 161,544
  • 178
  • 535
  • 806