urllib2 download HTML file

Question

Using urllib2 in Python 2.7.4, I can readily download an Excel file:

output_file = 'excel.xls'
url = 'http://www.nbmg.unr.edu/geothermal/GEOTHERM-30Jun11.xls'
file(output_file, 'wb').write(urllib2.urlopen(url).read())

This results in the expected file that I can use as I wish.

However, trying to download just an HTML file gives me an empty file:

output_file = 'webpage.html'
url = 'http://www.nbmg.unr.edu/geothermal/mapfiles/nvgeowel.html'
file(output_file, 'wb').write(urllib2.urlopen(url).read())

I had the same results using urllib. There must be something simple I'm missing or don't understand. How do I download an HTML file from a URL? Why doesn't my code work?

Ricardo · Accepted Answer · 2013-12-19T12:25:37.657

3

If you want to download files or simply save a webpage you can use urlretrieve(from urllib library)instead of use read and write.

import urllib
urllib.urlretrieve("http://www.nbmg.unr.edu/geothermal/mapfiles/nvgeowel.html","doc.html")
#urllib.urlretrieve("url","save as..")

If you need to set a timeout you have to put it at the start of your file:

import socket
socket.setdefaulttimeout(25)
#seconds

edited Dec 19 '13 at 12:25

answered Dec 19 '13 at 00:03

Ricardo

136
1
10

1

If you can expan on this, it would be an answer. Right now it looks like it should be a comment on the question. – Burhan Khalid Dec 19 '13 at 04:47
Thanks, I had used this previously with the same problem, but having it confirmed as correct is useful. I think I'm having issues with overwriting existing files rather than the downloading part. – Dylan Hettinger Dec 19 '13 at 17:57
I tested it overwriting an existing file and it works fine. You can try to download the file in a temporal folder. If you are using ubuntu you can do it in /tmp/ . – Ricardo Dec 19 '13 at 18:28

score 1 · Answer 2 · answered Dec 19 '13 at 04:12

1

It also Python 2.7.4 in my OS X 10.9, and the codes work well on it.

So I think there maybe other problems prevent its working. Can you open "http://www.nbmg.unr.edu/geothermal/GEOTHERM-30Jun11.xls" in your browser?

answered Dec 19 '13 at 04:12

Kane Blueriver

4,170
4
29
48

score 0 · Answer 3 · edited May 23 '17 at 10:27

0

This may not directly answer the question, but if you're working with HTTP and have sufficient privileges to install python packages, I'd really recommend doing this with 'requests'. There's a related answered here - https://stackoverflow.com/a/13137873/45698

edited May 23 '17 at 10:27

Community

1
1

answered Dec 19 '13 at 00:10

Hugo Rodger-Brown

11,054
11
52
78

urllib2 download HTML file

3 Answers3