I know that fetching a url is as simple as requests.get
and I can get at the raw response body and save it to a file, but for large files, is there a way to stream directly to a file? Like if I'm downloading a movie with it or something?
Asked
Active
Viewed 1e+01k times
89

Karl Knechtel
- 62,466
- 11
- 102
- 153

Matt Williamson
- 39,165
- 10
- 64
- 72
-
4I do not think this is quite a duplicate. This is a more generic question and people are more likely to search for this question than others due to the wording, as evidenced by the answer's upvotes. – Matt Williamson Aug 02 '13 at 15:41
-
Even closed, the flag as a duplicate will direct readers to an answer. – Richard Sitze Aug 02 '13 at 16:04
-
Perhaps, but the approach is the same. – Eric Brown Aug 02 '13 at 16:04
1 Answers
172
Oddly enough, requests doesn't have anything simple for this. You'll have to iterate over the response and write those chunks to a file:
response = requests.get('http://www.example.com/image.jpg', stream=True)
# Throw an error for bad status codes
response.raise_for_status()
with open('output.jpg', 'wb') as handle:
for block in response.iter_content(1024):
handle.write(block)
I usually just use urllib.urlretrieve()
. It works, but if you need to use a session or some sort of authentication, the above code works as well.

Blender
- 289,723
- 53
- 439
- 496
-
7That's a really good point. Someone should point that out to Kenneth Reitz--or maybe submit a pull request to the project? – jdotjdot Jan 01 '13 at 22:56
-
9
-
5Just a note--in the latest versions of requests, the prefetch arg has been changed to stream. So stream=True would be used here. – MikeHunter Jan 02 '13 at 00:54
-
1Fantastic. Thank you for explaining the correlation to urllib as well. – Matt Williamson Jan 02 '13 at 01:49
-
1@Blender I'm thinking more like `requests.get(...).save_as('file.jpg')`--want to make it as obvious as possible what's going on, and "save as" is more associated with a possible change in filename. Additionally, maybe a `stream=False` argument or something similar. – jdotjdot Jan 02 '13 at 03:29
-
1why the `if not block` check? could this simply be written as `map(handle.write, request.iter_content(1024))`? – Rich Tier Dec 18 '13 at 22:17
-
1
-
1sure, `map` results in a list of the returned value of `handle.write`, but the function will be called for each item. As long as I don't assign the returned list to a name then we can ignore it. I'm still interested: why are we checking if block is falsey? – Rich Tier Dec 19 '13 at 00:06
-
1@rikAtee: Creating a list with possibly thousands of `None`s in it is wasteful. If you want it to be "short", use `deque`. As for the check, `request.iter_content` calls `.read(block_size)` on a file-like object, which returns an empty string when the EOF is reached. – Blender Dec 19 '13 at 00:25
-
@Blender thanks for the tip and info. how would collections.deque help in this respect? As far as I can tell it does not execute a function for a iterable? – Rich Tier Dec 19 '13 at 17:14
-
@rikAtee: You'd do `deque(iterable, maxlen=0)` to "consume" an iterable, but I'm not sure where I was going with that when I said that it'd be useful. – Blender Dec 19 '13 at 20:11
-
6This code is not checking for status code, but simply writes the server's response message to the file as a string, in case there was some problem. I'd recommend inserting a `if not request.ok: return False` after the `get` line – hyperknot Apr 15 '14 at 20:05
-
@Blender: also, you should also put it in a try-except block, as connection errors are throwing exceptions in requests. – hyperknot Apr 15 '14 at 22:55
-
2@Blender @rikAtee I dug a bit into the code and am pretty confident that you don't have to check for a false'ish `block`: `requests.Response.iter_content()` [calls `stream()`](https://github.com/kennethreitz/requests/blob/master/requests/models.py#L657) on the `urllib3.Response` that is saved in `raw`. `stream()` is designed to [handle](https://github.com/shazow/urllib3/blob/master/urllib3/response.py#L448) [EOF](https://github.com/shazow/urllib3/blob/master/urllib3/response.py#L309). – dtk Jun 02 '15 at 22:07
-
1@zsero Improving on the approach it might be more pythonic to [handle errors](http://docs.python-requests.org/en/latest/user/quickstart/#response-status-codes) through [exceptions](http://docs.python-requests.org/en/latest/api/#requests.Response.raise_for_status). – dtk Jun 02 '15 at 22:13
-
I'd put lines 2-5 outside the `with` block... if the request didn't succeed there's no need to create the file – Anentropic Aug 21 '15 at 21:39
-
@Blender I get `AttributeError: 'Response' object has no attribute 'save'` – tommy.carstensen Aug 31 '15 at 14:13
-
1@tommy.carstensen: `Response.save` doesn't exist. I was saying that it would be nice if it did. – Blender Aug 31 '15 at 15:00
-
5@zsero You can also use `response.raise_for_status()` if you want to raise an exception when the response code is not within range of 200-206 – HEADLESS_0NE Apr 11 '16 at 22:30