4

In the browser if I click on a download button, a file will be downloaded with its original name and metadata.

Currently I can download a file with python requests, but I have to give it a name and none of the metadata is available with the downloaded file.

Files that I want to download have meaningful names, but these names are not part of the url.

What is the best way to do this with python?

Only these headers are in the response:

  • Server
  • Date
  • Content-Type
  • Connection
  • Vary
  • X-Powered-By
  • Pragma
  • Set-Cookie
  • Expires
  • Cache-Control
  • Link
  • Content-Encoding

Content-Disposition header is not available in r.headers but I can see it if I download the file in the browser.

Traceback (most recent call last):
  File "download.py", line 53, in <module>
    print r.headers["Content-Disposition"]
  File "/Users/raitis.dembovskis/.virtualenvs/webcrawler/lib/python2.7/site-packages/requests/structures.py", line 54, in __getitem__
    return self._store[key.lower()][1]
KeyError: 'content-disposition'
raitisd
  • 3,875
  • 5
  • 26
  • 37
  • Possible duplicate of [How to get pdf filename with Python requests?](http://stackoverflow.com/questions/31804799/how-to-get-pdf-filename-with-python-requests) – prashant mavadiya Oct 20 '16 at 12:21

2 Answers2

1

This can be done only if header of that url has the information of file name:

result.urllib2.urlopen(url)
result.headers['content-disposition']
or
result.info()
prashant mavadiya
  • 176
  • 1
  • 2
  • 9
0

The file name is set by server using Content-Disposition header as follows:

Content-Disposition: attachment; filename="downloaded.pdf"

so, try to read from headers and extract filename given in the header and use it.

Reference:

  1. How to set name of file downloaded from browser?
Community
  • 1
  • 1
Naveen Kumar R B
  • 6,248
  • 5
  • 32
  • 65