0

I want to write a python script which downloads the web-page only if the web-page contains HTML. I know that content-type in header will be used. Please suggest someway to do it as i am unable to get a way to get header before the file download.

chinmayaposwalia
  • 243
  • 5
  • 13
  • @NiklasB. I have explored the request object and tried the retrieve function bu it creates a file on the file system first and returns the email.mimetype object. But i want to download the file only if the content is HTML – chinmayaposwalia Mar 17 '12 at 13:58
  • Have a look at [this question](http://stackoverflow.com/questions/843392/python-get-http-headers-from-urllib-call) – Lev Levitsky Mar 17 '12 at 14:12

1 Answers1

2

Use http.client to send a HEAD request to the URL. This will return only the headers for the resource then you can look at the content-type header and see if it text/html. If it is then send a GET request to the URL to get the body.

Lance Helsten
  • 9,457
  • 3
  • 16
  • 16