Using urllib2 in Python. How do I get the name of the file I am downloading?

Question

I am a python beginner. I am using urllib2 to download files. When I download a file, I specify a filename to with which to save the downloaded file on my hard drive. However, if I download the file using my browser, a default filename is automatically provided.

Here is a simplified version of my code:

def downloadmp3(url):
    webFile = urllib2.urlopen(url)
    filename = 'temp.zip'
    localFile = open(filename, 'w')
    localFile.write(webFile.read())

The file downloads just fine, but if I type the string stored in the variable "url" into my browser, there is a default filename given to the file when I download it. I want to use this filename for my downloaded file not 'temp.zip' or whatever I assign it.

How do I use urllib2 (or some other Python library) to save the file with the filename that the server I am downloading from intends it to have?

If anyone doesn't understand this question, please say so, so that I can try to make it clearer.

possible duplicate of [urllib2 file name](http://stackoverflow.com/questions/163009/urllib2-file-name) — Nick Presta, Apr 04 '11 at 02:07
There are two places to look for a file name: the Content-Disposition header field and the URL. Use cgi.parse_header() to parse the header field. Use urlparse()/urlsplit() and posixpath.basename() to parse the URL. See this answer for examples: http://stackoverflow.com/a/11783319/205212 — ʇsәɹoɈ, Oct 11 '16 at 17:23

score 8 · Answer 1 · edited Sep 12 '11 at 00:25

The filename is usually included by the server through the content-disposition header:

content-disposition: attachment; filename=foo.pdf

You have access to the headers through

result = urllib2.urlopen(...)
result.info() <- contains the headers


i>>> import urllib2
ur>>> result = urllib2.urlopen('http://zopyx.com')
>>> print result
<addinfourl at 4302289808 whose fp = <socket._fileobject object at 0x1006dd5d0>>
>>> result.info()
<httplib.HTTPMessage instance at 0x1006fbab8>
>>> result.info().headers
['Date: Mon, 04 Apr 2011 02:08:28 GMT\r\n', 'Server: Zope/(unreleased version, python 2.4.6, linux2) ZServer/1.1 Plone/3.3.4\r\n', 'Content-Length: 15321\r\n', 'Content-Type: text/html; charset=utf-8\r\n', 'Via: 1.1 www.zopyx.com\r\n', 'Cache-Control: max-age=3600\r\n', 'Expires: Mon, 04 Apr 2011 03:08:28 GMT\r\n', 'Connection: close\r\n']

See

http://docs.python.org/library/urllib2.html

But be aware that this header does not need to be present. Otherwise you need to generate a reasonable name yourself from the URL requested - e.g. from the last component of the URI. Use the urlparse() method of Python in this case.

Régis B. · Answer 2 · 2015-05-11T06:28:46.943

1

My issue with the previous answers is that they were using the original URL, and that would fail in the case of a redirect. Here's how I do it: (note the use of result.url instead of url)

import os
import urllib2
result = urllib2.urlopen(url)
filename = os.path.basename(urllib2.urlparse.urlparse(result.url).path)

edited May 11 '15 at 06:28

answered May 11 '15 at 06:20

Régis B.

10,092
6
54
90

GabLeRoux · Answer 3 · 2016-06-10T18:12:07.880

I had an issue where server did not give me any content-disposition header so if it's also your case, you can extract filename from url like this:

os.path.basename(urlparse.urlparse(file_url))

In my case, I used file_stream.headers.subtype which contained file extension and I renamed files based on my django's model slug, here's an example:

import urlparse, os

tmp_file = NamedTemporaryFile(delete=True)
file_stream = urllib2.urlopen(file_url)
tmp_file.write(file_stream.read())
tmp_file.flush()

new_file_name = "some_prefix_" + my_model.slug + "." + file_stream.headers.subtype
#You may prefer this:
# new_file_name = os.path.basename(urlparse.urlparse(file_url))

my_model.file.save(new_file_name, File(tmp_file))

Last line is saving file using django's save method, also handling duplicated file names by adding random characters at the end :)

Awesome.

score 0 · Answer 4 · answered Apr 04 '11 at 02:05

0

You can do that using urlretrieve :

http://docs.python.org/library/urllib.html

answered Apr 04 '11 at 02:05

Spyros

46,820
25
86
129

Using urllib2 in Python. How do I get the name of the file I am downloading?

4 Answers4

Linked