Save a large file using the Python requests library

Question

I know that fetching a url is as simple as requests.get and I can get at the raw response body and save it to a file, but for large files, is there a way to stream directly to a file? Like if I'm downloading a movie with it or something?

I do not think this is quite a duplicate. This is a more generic question and people are more likely to search for this question than others due to the wording, as evidenced by the answer's upvotes. — Matt Williamson, Aug 02 '13 at 15:41
Even closed, the flag as a duplicate will direct readers to an answer. — Richard Sitze, Aug 02 '13 at 16:04

Blender · Accepted Answer · 2017-05-11T05:05:40.033

172

Oddly enough, requests doesn't have anything simple for this. You'll have to iterate over the response and write those chunks to a file:

response = requests.get('http://www.example.com/image.jpg', stream=True)

# Throw an error for bad status codes
response.raise_for_status()

with open('output.jpg', 'wb') as handle:
    for block in response.iter_content(1024):
        handle.write(block)

I usually just use urllib.urlretrieve(). It works, but if you need to use a session or some sort of authentication, the above code works as well.

edited May 11 '17 at 05:05

answered Jan 01 '13 at 22:16

Blender

289,723
53
439
496

7

That's a really good point. Someone should point that out to Kenneth Reitz--or maybe submit a pull request to the project? – jdotjdot Jan 01 '13 at 22:56
9

@jdotjdot: Something like `requests.get(...).save('file.txt')`? – Blender Jan 02 '13 at 00:09
5

Just a note--in the latest versions of requests, the prefetch arg has been changed to stream. So stream=True would be used here. – MikeHunter Jan 02 '13 at 00:54
1

Fantastic. Thank you for explaining the correlation to urllib as well. – Matt Williamson Jan 02 '13 at 01:49
1

@Blender I'm thinking more like `requests.get(...).save_as('file.jpg')`--want to make it as obvious as possible what's going on, and "save as" is more associated with a possible change in filename. Additionally, maybe a `stream=False` argument or something similar. – jdotjdot Jan 02 '13 at 03:29
1

why the `if not block` check? could this simply be written as `map(handle.write, request.iter_content(1024))`? – Rich Tier Dec 18 '13 at 22:17
1

@rikAtee: Because that'd create an intermediate list. – Blender Dec 18 '13 at 23:33
1

sure, `map` results in a list of the returned value of `handle.write`, but the function will be called for each item. As long as I don't assign the returned list to a name then we can ignore it. I'm still interested: why are we checking if block is falsey? – Rich Tier Dec 19 '13 at 00:06
1

@rikAtee: Creating a list with possibly thousands of `None`s in it is wasteful. If you want it to be "short", use `deque`. As for the check, `request.iter_content` calls `.read(block_size)` on a file-like object, which returns an empty string when the EOF is reached. – Blender Dec 19 '13 at 00:25
@Blender thanks for the tip and info. how would collections.deque help in this respect? As far as I can tell it does not execute a function for a iterable? – Rich Tier Dec 19 '13 at 17:14
@rikAtee: You'd do `deque(iterable, maxlen=0)` to "consume" an iterable, but I'm not sure where I was going with that when I said that it'd be useful. – Blender Dec 19 '13 at 20:11
6

This code is not checking for status code, but simply writes the server's response message to the file as a string, in case there was some problem. I'd recommend inserting a `if not request.ok: return False` after the `get` line – hyperknot Apr 15 '14 at 20:05
@Blender: also, you should also put it in a try-except block, as connection errors are throwing exceptions in requests. – hyperknot Apr 15 '14 at 22:55
2

@Blender @rikAtee I dug a bit into the code and am pretty confident that you don't have to check for a false'ish `block`: `requests.Response.iter_content()` [calls `stream()`](https://github.com/kennethreitz/requests/blob/master/requests/models.py#L657) on the `urllib3.Response` that is saved in `raw`. `stream()` is designed to [handle](https://github.com/shazow/urllib3/blob/master/urllib3/response.py#L448) [EOF](https://github.com/shazow/urllib3/blob/master/urllib3/response.py#L309). – dtk Jun 02 '15 at 22:07
1

@zsero Improving on the approach it might be more pythonic to [handle errors](http://docs.python-requests.org/en/latest/user/quickstart/#response-status-codes) through [exceptions](http://docs.python-requests.org/en/latest/api/#requests.Response.raise_for_status). – dtk Jun 02 '15 at 22:13
I'd put lines 2-5 outside the `with` block... if the request didn't succeed there's no need to create the file – Anentropic Aug 21 '15 at 21:39
@Blender I get `AttributeError: 'Response' object has no attribute 'save'` – tommy.carstensen Aug 31 '15 at 14:13
1

@tommy.carstensen: `Response.save` doesn't exist. I was saying that it would be nice if it did. – Blender Aug 31 '15 at 15:00
5

@zsero You can also use `response.raise_for_status()` if you want to raise an exception when the response code is not within range of 200-206 – HEADLESS_0NE Apr 11 '16 at 22:30

Save a large file using the Python requests library

1 Answers1

Linked