How to download a file from an url and keep its name and metadata with python requests

Question

In the browser if I click on a download button, a file will be downloaded with its original name and metadata.

Currently I can download a file with python requests, but I have to give it a name and none of the metadata is available with the downloaded file.

Files that I want to download have meaningful names, but these names are not part of the url.

What is the best way to do this with python?

Only these headers are in the response:

Server
Date
Content-Type
Connection
Vary
X-Powered-By
Pragma
Set-Cookie
Expires
Cache-Control
Link
Content-Encoding

Content-Disposition header is not available in r.headers but I can see it if I download the file in the browser.

Traceback (most recent call last):
  File "download.py", line 53, in <module>
    print r.headers["Content-Disposition"]
  File "/Users/raitis.dembovskis/.virtualenvs/webcrawler/lib/python2.7/site-packages/requests/structures.py", line 54, in __getitem__
    return self._store[key.lower()][1]
KeyError: 'content-disposition'

Possible duplicate of [How to get pdf filename with Python requests?](http://stackoverflow.com/questions/31804799/how-to-get-pdf-filename-with-python-requests) — prashant mavadiya, Oct 20 '16 at 12:21

score 1 · Answer 1 · answered Oct 20 '16 at 12:20

1

This can be done only if header of that url has the information of file name:

result.urllib2.urlopen(url)
result.headers['content-disposition']
or
result.info()

answered Oct 20 '16 at 12:20

prashant mavadiya

176
1
2
9

this header is not available – raitisd Oct 20 '16 at 12:23

score 0 · Answer 2 · edited May 23 '17 at 12:19

0

The file name is set by server using Content-Disposition header as follows:

Content-Disposition: attachment; filename="downloaded.pdf"

so, try to read from headers and extract filename given in the header and use it.

Reference:

How to set name of file downloaded from browser?

edited May 23 '17 at 12:19

Community

1
1

answered Oct 20 '16 at 12:17

Naveen Kumar R B

6,248
5
32
65

this header is not available – raitisd Oct 20 '16 at 12:23
capture the request using network sniffing tool (burp suite, Wireshark) or browser (F12 -> Network tab) and check the headers. Browser uses the header to give a name to the file. so, it should be present. – Naveen Kumar R B Oct 20 '16 at 12:48
this header is present in the browser but if i check 'response.headers' its not there – raitisd Oct 20 '16 at 12:50
is it publicly accessible url (to download)? If yes, share with us, will try. Or share similar URL – Naveen Kumar R B Oct 20 '16 at 12:53
Try with browser User-Agent header such as "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:49.0) Gecko/20100101 Firefox/49.0" instead of default value in request headers. – Naveen Kumar R B Oct 20 '16 at 12:57
So I didn't change anything and now `Content-Disposition` is available. Before I also tried with setting user agent, but this header just wasn't there. – raitisd Oct 20 '16 at 13:04
Haha, yes. I should delete this question. – raitisd Oct 20 '16 at 13:12

How to download a file from an url and keep its name and metadata with python requests

2 Answers2