Downloading Files with Python Urllib, Urllib2

Question

I am trying to download files from a website using urllib as described in this thread: link text

import urllib
urllib.urlretrieve ("http://www.example.com/songs/mp3.mp3", "mp3.mp3")

I am able to download the files (mostly pdf) but all I get is corrupted files that cannot open. I suspect it's because the website requires a login.

How can the above function be modified to handle cookies? I already know the names of the form fields that carry the username & password information. When I print the return values of urlretrieve I get messages like:

a, b = urllib.urlretrieve ("http://www.example.com/songs/mp3.mp3", "mp3.mp3")
print a, b

>> **cache-control:** no-cache, no-store, must-revalidate, s-maxage=300, proxy-revalida
te

>> **connection:** close

I am able to manually download the files if I enter their urls in the browser. Thanks

If the website requires a login, you should be redirected to a login page, but the page will be saved as your file name you have passed + the extension. Rename your `mp3.mp3` to something like `mp3.html` and try to open it with a web browser. - This is jsut to make sure it asks for a login — ccheneson, Jan 22 '11 at 13:23
look at the requests library. unless you have to use urllib2, just don't - it does nothing but make everything complicated. http://pypi.python.org/pypi/requests — Jonathan Vanasco, Feb 10 '13 at 03:26

Erik Johansson · Answer 1 · 2011-01-22T15:03:58.230

1

First urllib2 actually supports cookies and cookie handling should be easy, second of all you can check what kind of file you have downloaded. E.g. AFAIK all mp3 starts with the bytes "ID3"

import cookielib, urllib2
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
r = opener.open("http://example.com/")

edited Jan 22 '11 at 15:03

answered Jan 22 '11 at 13:57

Erik Johansson

323
1
5
15

score 0 · Answer 2 · answered Jan 22 '11 at 13:23

0

I might be possible that the server you requesting to is looking for certain header messages, such as User-Agent. You may try mimicking a browser behavior by sending additional headers.

answered Jan 22 '11 at 13:23

rubayeet

9,269
8
46
55

Thanks ccheneson & rubayeet! It was my mistake - There were some errors in my file names which causes the browser to redirect to the login page. I am able to download now using mechanize through: file.write(browser.response().read()) :) – SleepingSpider Jan 22 '11 at 13:54

Downloading Files with Python Urllib, Urllib2

2 Answers2