I want to make a web crawler to make a statistic about most popular server software among Bulgarian sites, such as Apache, nginx, etc. Here is what I came up with:
import requests
r = requests.get('http://start.bg')
print(r.headers)
Which return the following:
{'Debug': 'unk',
'Content-Type': 'text/html; charset=utf-8',
'X-Powered-By': 'PHP/5.3.3',
'Content-Length': '29761',
'Connection': 'close',
'Set-Cookie': 'fbnr=1; expires=Sat, 13-Feb-2016 22:00:01 GMT; path=/; domain=.start.bg',
'Date': 'Sat, 13 Feb 2016 13:43:50 GMT',
'Vary': 'Accept-Encoding',
'Server': 'Apache/2.2.15 (CentOS)',
'Content-Encoding': 'gzip'}
Here you can easily see that it runs on Apache/2.2.15 and you can get this result by simply saying r.headers['Server']
. I tried that with several Bulgarian websites and they all had the Server key.
However, when I request the header of a more sophisticated website, such as www.teslamotors.com, I get the following info:
{'Content-Type': 'text/html; charset=utf-8',
'X-Cache-Hits': '9',
'Cache-Control': 'max-age=0, no-cache, no-store',
'X-Content-Type-Options': 'nosniff',
'Connection': 'keep-alive',
'X-Varnish-Server': 'sjc04p1wwwvr11.sjc05.teslamotors.com',
'Content-Language': 'en',
'Pragma': 'no-cache',
'Last-Modified': 'Sat, 13 Feb 2016 13:07:50 GMT',
'X-Server': 'web03a',
'Expires': 'Sat, 13 Feb 2016 13:37:55 GMT',
'Content-Length': '10290',
'Date': 'Sat, 13 Feb 2016 13:37:55 GMT',
'Vary': 'Accept-Encoding',
'ETag': '"1455368870-1"',
'X-Frame-Options': 'SAMEORIGIN',
'Accept-Ranges': 'bytes',
'Content-Encoding': 'gzip'}
As you can see there isn't any ['Server']
key in this dictionary (although there is X-Server
and X-Varnish-Server
which I'm not sure what they mean, but its value is not a server name like Apache.
So i'm thinking there must be another request I could send that would yield the desired server information, or probably they have their own specific server software (which sounds plausible for facebook).
I also tried other .com websites, such as https://spotify.com and it does have a ['Server']
key.
So is there a way to find the info about the servers Facebook and Tesla Motors use?