Why to use iter_content and chunk_size in python requests

Question

Why should I use iter_content and specially I'm really confused with the purpose using of chunk_size , as I have tried using it and in every way the file seems to be saved after downloading successfully.

g = requests.get(url, stream=True)

with open('c:/users/andriken/desktop/tiger.jpg', 'wb') as sav:
    for chunk in g.iter_content(chunk_size=1000000):
        print (chunk)
        sav.write(chunk)

Help me understand the use of iter_content and what will happen as you see I am using 1000000 bytes as chunk_size, what is the purpose exactly and results?

score 30 · Accepted Answer · answered Sep 13 '17 at 19:49

30

This is to prevent loading the entire response into memory at once (it also allows you to implement some concurrency while you stream the response so that you can do work while waiting for request to finish).

The purpose of setting streaming request is usually for media. Like try to download a 500 MB .mp4 file using requests, you want to stream the response (and write the stream in chunks of chunk_size) instead of waiting for all 500mb to be loaded into python at once.

If you want to implement any UI feedback (such as download progress like "downloaded <chunk_size> bytes..."), you will need to stream and chunk. If your response contains a Content-Size header, you can calculate % completion on every chunk you save too.

answered Sep 13 '17 at 19:49

cowbert

3,212
2
25
34

1

you mean , like using it to stream actual video in a player , with the use of available chunk of data while writing ? – Andriken sonwane Sep 13 '17 at 19:53
well yeah, in the example you gave, `sav` is an open file handle, it is possible to make it represents an open pipe to the player or something. – cowbert Sep 13 '17 at 19:56
yes exactly i understand now the concept , anyways can you tell me what iter_lines does ? – Andriken sonwane Sep 13 '17 at 20:00
3

Normally, if you don't use `stream=True`, the [Response](http://docs.python-requests.org/en/master/api/#requests.Response) object returned by `.get()` holds the actual, entire response content in `Response.content`. But if you are streaming, the behavior of `.content` obviously needs to change (it can't hold all of the data), you use the `iter_content` method to fetch subsequent chunks of data from the server into `.content` and return it. This follows the typical python [iterator](https://wiki.python.org/moin/Iterator) pattern. – cowbert Sep 13 '17 at 20:07
yeah i know that already when to use stream=True , i was just confused but now your answer as an example helped me understand ,Thanks , god bless you ! – Andriken sonwane Sep 13 '17 at 20:14

amarynets · Answer 2 · 2020-03-08T20:17:16.060

7

From the documentations chunk_size is size of data, that app will be reading in memory when stream=True.

For example, if the size of the response is 1000 and chunk_size set to 100, we split the response into ten chunks.

edited Mar 08 '20 at 20:17

answered Sep 13 '17 at 19:43

amarynets

1,765
10
27

score 2 · Answer 3 · answered Jan 27 '23 at 10:04

In situations when data is delivered without a content-length header, using HTTP1.1 chunked transfer encoding (CTE) mode or HTTP2/3 data frames, where minimal latency is required it can be useful to deal with each HTTP chunk as it arrives, as opposed to waiting till the buffer hits a specific size.

This can be achieved by setting chunk_size = None.

Why to use iter_content and chunk_size in python requests

3 Answers3

Linked