0

I am trying to receive only partial data from a request. The server does not respond to the Range header.

I have attempted the following:

def get_data(change_id):
    url = "https://my-api.com?id={}".format(change_id)
    r = requests.get(url, stream=True, headers=headers)

    for data in r.iter_content(chunk_size=512):
        return extract_change_id(data)

This still completes the full request as far as I can see.

The request returns a new change id which gives access to the next request as a river. The idea is to read the first few bytes and extract the change id from the body since the change id appears first in every body and then pass it off to a new thread to be processed. Each request is upwards of 5MBs and needs to be handled concurrently to stay up to date with the river. Simply reading the whole request and then parsing to json is too slow.

The extract function is a simple double regex find.

def extract_change_id(data):
    try:
        full_change_id = full_change_id_regex.search(data).group(0)
        change_id = change_id_regex.search(full_change_id).group(0)
        return change_id
    except Exception as e:
        return None
Titan Chase
  • 101
  • 1
  • 6
  • As far as I know, there is no way to achieve what you are doing. Once you send a request, and there is network error on either the server or your side, then the whole response will get downloaded whether you like it or not. So if the server is not responding to such headers, you are out of luck – Charchit Agarwal Jul 04 '22 at 05:25
  • If you are determined however, and you don't mind going extremely low level, like working with raw sockets, then you can set a custom TCP receive window which might help. This is basically the network buffer for each request and will work similarly to the range header if you do not read from the socket as and when the server sends data. The buffer will become full and the server waits for it to clear. – Charchit Agarwal Jul 04 '22 at 05:30

0 Answers0