1

Using urllib2 in Python 2.7.4, I can readily download an Excel file:

output_file = 'excel.xls'
url = 'http://www.nbmg.unr.edu/geothermal/GEOTHERM-30Jun11.xls'
file(output_file, 'wb').write(urllib2.urlopen(url).read())

This results in the expected file that I can use as I wish.

However, trying to download just an HTML file gives me an empty file:

output_file = 'webpage.html'
url = 'http://www.nbmg.unr.edu/geothermal/mapfiles/nvgeowel.html'
file(output_file, 'wb').write(urllib2.urlopen(url).read())

I had the same results using urllib. There must be something simple I'm missing or don't understand. How do I download an HTML file from a URL? Why doesn't my code work?

Dylan Hettinger
  • 731
  • 1
  • 11
  • 20

3 Answers3

3

If you want to download files or simply save a webpage you can use urlretrieve(from urllib library)instead of use read and write.

import urllib
urllib.urlretrieve("http://www.nbmg.unr.edu/geothermal/mapfiles/nvgeowel.html","doc.html")
#urllib.urlretrieve("url","save as..")

If you need to set a timeout you have to put it at the start of your file:

import socket
socket.setdefaulttimeout(25)
#seconds
Ricardo
  • 136
  • 1
  • 10
  • 1
    If you can expan on this, it would be an answer. Right now it looks like it should be a comment on the question. – Burhan Khalid Dec 19 '13 at 04:47
  • Thanks, I had used this previously with the same problem, but having it confirmed as correct is useful. I think I'm having issues with overwriting existing files rather than the downloading part. – Dylan Hettinger Dec 19 '13 at 17:57
  • I tested it overwriting an existing file and it works fine. You can try to download the file in a temporal folder. If you are using ubuntu you can do it in /tmp/ . – Ricardo Dec 19 '13 at 18:28
1

It also Python 2.7.4 in my OS X 10.9, and the codes work well on it.

So I think there maybe other problems prevent its working. Can you open "http://www.nbmg.unr.edu/geothermal/GEOTHERM-30Jun11.xls" in your browser?

Kane Blueriver
  • 4,170
  • 4
  • 29
  • 48
0

This may not directly answer the question, but if you're working with HTTP and have sufficient privileges to install python packages, I'd really recommend doing this with 'requests'. There's a related answered here - https://stackoverflow.com/a/13137873/45698

Community
  • 1
  • 1
Hugo Rodger-Brown
  • 11,054
  • 11
  • 52
  • 78