How to get content-type from selenium page_source

Question

I know the content-type can be gotten from

response = urllib2.urlopen(url)
content-type = response.info().getheader('Content-type')

Now, I need to execute js code so I choose selenium with Phantomjs to fetch web page.

driver = webdriver.PhantomJS()
driver.get(url)
source = driver.page_source

How can I get content-type from source without downloading web page twice? I know I can save the response.read() as html file, and then driver render the local html file without downloading it again. However, it's too slow. Any suggestions?

score 3 · Accepted Answer · edited May 23 '17 at 11:45

3

Selenium does not get the headers but you can just request the head with requests:

import  requests

print(requests.head(url).headers["Content-Type"])

You can use httplib2, urliib2 etc.. there are numerous answers here showing how to request the head with various libs.

edited May 23 '17 at 11:45

Community

1
1

answered Mar 24 '16 at 10:06

Padraic Cunningham

176,452
29
245
321

1

Thx! it really helps. – SimmerChan Mar 24 '16 at 11:47
No worries, a head request should be pretty efficient. – Padraic Cunningham Mar 24 '16 at 11:50

How to get content-type from selenium page_source

1 Answers1