Python requests, how to limit received size, transfer rate, and/or total time?

Question

My server does external requests and I'd like to limit the damage a failing request can do. I'm looking to cancel the request in these situations:

the total time of the request is over a certain limit (even if data is still arriving)
the total received size exceeds some limit (I need to cancel prior to accepting more data)
the transfer speed drops below some level (though I can live without this one if a total time limit can be provided)

Note I am not looking for the timeout parameter in requests, as this is a timeout only for inactivity. I'm unable to find anything to do with a total timeout, or a way to limit the total size. One example shows a maxsize parameter on HTTPAdapter but that is not documented.

How can I achieve these requirements using requests?

`maxsize` is a limit on the connection pool, I think, not on recieved size. — Martijn Pieters, Mar 12 '14 at 08:49
Not a solution, but you should also make sure that size limit also take account of the size of the headers, which some libraries (like urllib) don't. — Valentin Lorentz, Mar 12 '14 at 08:59
@ValentinLorentz, yes, indeed I'd want a much lower size limit on the headers than the content. — edA-qa mort-ora-y, Mar 12 '14 at 09:02
about total timeout, you might like to have a look at my answer to a similar question: http://stackoverflow.com/a/22377499/1653521 — Hieu, Mar 13 '14 at 12:50

score 25 · Accepted Answer · edited Sep 05 '17 at 19:19

25

You could try setting stream=True, then aborting a request when your time or size limits are exceeded while you read the data in chunks.

As of requests release 2.3.0 the timeout applies to streaming requests too, so all you need to do is allow for a timeout for the initial connection and each iteration step:

r = requests.get(..., stream=True, timeout=initial_timeout)
r.raise_for_status()

if int(r.headers.get('Content-Length')) > your_maximum:
    raise ValueError('response too large')

size = 0
start = time.time()

for chunk in r.iter_content(1024):
    if time.time() - start > receive_timeout:
        raise ValueError('timeout reached')

    size += len(chunk)
    if size > your_maximum:
        raise ValueError('response too large')

    # do something with chunk

Adjust the timeout as needed.

For requests releases < 2.3.0 (which included this change) you could not time out the r.iter_content() yield; a server that stops responding in the middle of a chunk would still tie up the connection. You'd have to wrap the above code in an additional timeout function to cut off long-running responses early.

edited Sep 05 '17 at 19:19

Antti Haapala -- Слава Україні

129,958
22
279
321

answered Mar 12 '14 at 09:46

Martijn Pieters

1,048,767
296
4,058
3,343

A small suggestion would be to increment the received content as each chunk arrives, as you did in [your other answer](http://stackoverflow.com/questions/23514256/http-request-with-timeout-maximum-size-and-connection-pooling). +1 – zx81 Jul 23 '15 at 02:32
@zx81: that is what the *do something with chunk* comment is about; you don't *have* to collect all content into one big string, you could also process it iteratively. – Martijn Pieters Jul 23 '15 at 07:47
@MartijnPieters Yes, I saw that. It was just a suggestion to make the code more immediately useful to the average passerby. No worries though, they can read the comments. :) Best wishes – zx81 Jul 23 '15 at 08:29
1

It should be noted that unless you are (a) writing the data to disk or (b) processing the streamed data in memory (as it streams), it's likely more performant to set the chunk size to the maximum chunk size you allow. Reading in small chunk sizes will be significantly slower, and the end result is the data stored in memory anyways. – chander May 12 '21 at 17:58

score -2 · Answer 2 · answered Apr 07 '21 at 13:06

Its works for me

import requests

response = requests.get(your_url, stream=True, timeout=10)
response_content = [] #contains partial or full page_source 

for chunk in response.iter_content(1024):
    if len(chunk)>10000: # you can decide your chunk size limit(page_size)
       response_content.append(chunk)
       response.close()
       break
     else:
         response_content.append(chunk) # has full page source
         break

Python requests, how to limit received size, transfer rate, and/or total time?

2 Answers2

Linked

Related