89

I know that fetching a url is as simple as requests.get and I can get at the raw response body and save it to a file, but for large files, is there a way to stream directly to a file? Like if I'm downloading a movie with it or something?

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
Matt Williamson
  • 39,165
  • 10
  • 64
  • 72

1 Answers1

172

Oddly enough, requests doesn't have anything simple for this. You'll have to iterate over the response and write those chunks to a file:

response = requests.get('http://www.example.com/image.jpg', stream=True)

# Throw an error for bad status codes
response.raise_for_status()

with open('output.jpg', 'wb') as handle:
    for block in response.iter_content(1024):
        handle.write(block)

I usually just use urllib.urlretrieve(). It works, but if you need to use a session or some sort of authentication, the above code works as well.

Blender
  • 289,723
  • 53
  • 439
  • 496
  • 7
    That's a really good point. Someone should point that out to Kenneth Reitz--or maybe submit a pull request to the project? – jdotjdot Jan 01 '13 at 22:56
  • 9
    @jdotjdot: Something like `requests.get(...).save('file.txt')`? – Blender Jan 02 '13 at 00:09
  • 5
    Just a note--in the latest versions of requests, the prefetch arg has been changed to stream. So stream=True would be used here. – MikeHunter Jan 02 '13 at 00:54
  • 1
    Fantastic. Thank you for explaining the correlation to urllib as well. – Matt Williamson Jan 02 '13 at 01:49
  • 1
    @Blender I'm thinking more like `requests.get(...).save_as('file.jpg')`--want to make it as obvious as possible what's going on, and "save as" is more associated with a possible change in filename. Additionally, maybe a `stream=False` argument or something similar. – jdotjdot Jan 02 '13 at 03:29
  • 1
    why the `if not block` check? could this simply be written as `map(handle.write, request.iter_content(1024))`? – Rich Tier Dec 18 '13 at 22:17
  • 1
    @rikAtee: Because that'd create an intermediate list. – Blender Dec 18 '13 at 23:33
  • 1
    sure, `map` results in a list of the returned value of `handle.write`, but the function will be called for each item. As long as I don't assign the returned list to a name then we can ignore it. I'm still interested: why are we checking if block is falsey? – Rich Tier Dec 19 '13 at 00:06
  • 1
    @rikAtee: Creating a list with possibly thousands of `None`s in it is wasteful. If you want it to be "short", use `deque`. As for the check, `request.iter_content` calls `.read(block_size)` on a file-like object, which returns an empty string when the EOF is reached. – Blender Dec 19 '13 at 00:25
  • @Blender thanks for the tip and info. how would collections.deque help in this respect? As far as I can tell it does not execute a function for a iterable? – Rich Tier Dec 19 '13 at 17:14
  • @rikAtee: You'd do `deque(iterable, maxlen=0)` to "consume" an iterable, but I'm not sure where I was going with that when I said that it'd be useful. – Blender Dec 19 '13 at 20:11
  • 6
    This code is not checking for status code, but simply writes the server's response message to the file as a string, in case there was some problem. I'd recommend inserting a `if not request.ok: return False` after the `get` line – hyperknot Apr 15 '14 at 20:05
  • @Blender: also, you should also put it in a try-except block, as connection errors are throwing exceptions in requests. – hyperknot Apr 15 '14 at 22:55
  • 2
    @Blender @rikAtee I dug a bit into the code and am pretty confident that you don't have to check for a false'ish `block`: `requests.Response.iter_content()` [calls `stream()`](https://github.com/kennethreitz/requests/blob/master/requests/models.py#L657) on the `urllib3.Response` that is saved in `raw`. `stream()` is designed to [handle](https://github.com/shazow/urllib3/blob/master/urllib3/response.py#L448) [EOF](https://github.com/shazow/urllib3/blob/master/urllib3/response.py#L309). – dtk Jun 02 '15 at 22:07
  • 1
    @zsero Improving on the approach it might be more pythonic to [handle errors](http://docs.python-requests.org/en/latest/user/quickstart/#response-status-codes) through [exceptions](http://docs.python-requests.org/en/latest/api/#requests.Response.raise_for_status). – dtk Jun 02 '15 at 22:13
  • I'd put lines 2-5 outside the `with` block... if the request didn't succeed there's no need to create the file – Anentropic Aug 21 '15 at 21:39
  • @Blender I get `AttributeError: 'Response' object has no attribute 'save'` – tommy.carstensen Aug 31 '15 at 14:13
  • 1
    @tommy.carstensen: `Response.save` doesn't exist. I was saying that it would be nice if it did. – Blender Aug 31 '15 at 15:00
  • 5
    @zsero You can also use `response.raise_for_status()` if you want to raise an exception when the response code is not within range of 200-206 – HEADLESS_0NE Apr 11 '16 at 22:30