I want to write a python script which downloads the web-page only if the web-page contains HTML. I know that content-type
in header
will be used. Please suggest someway to do it as i am unable to get a way to get header
before the file download.
Asked
Active
Viewed 105 times
0

chinmayaposwalia
- 243
- 5
- 13
-
@NiklasB. I have explored the request object and tried the retrieve function bu it creates a file on the file system first and returns the email.mimetype object. But i want to download the file only if the content is HTML – chinmayaposwalia Mar 17 '12 at 13:58
-
Have a look at [this question](http://stackoverflow.com/questions/843392/python-get-http-headers-from-urllib-call) – Lev Levitsky Mar 17 '12 at 14:12
1 Answers
2
Use http.client
to send a HEAD
request to the URL. This will return only the headers for the resource then you can look at the content-type
header and see if it text/html
. If it is then send a GET
request to the URL to get the body.

Lance Helsten
- 9,457
- 3
- 16
- 16