1

I am using requests to download a large (~50MiB) file on a small embedded device running Linux.

File is to be written to attached MMC.

Unfortunately MMC write speed is lower then net speed and I see memory consumption raise and, in a few cases I even had kernel "unable to handle page..." error.

Device has only 128MiB RAM.

The code I'm using is:

            with requests.get(URL,  stream=True) as r:
                if r.status_code != 200:
                    log.error(f'??? download returned {r.status_code}')
                    return -(offset + r.status_code)
                siz = 0
                with open(sfn, 'wb') as fo:
                    for chunk in r.iter_content(chunk_size=4096):
                        fo.write(chunk)
                        siz += len(chunk)
                return siz

How can I temporarily stop server while I writer to MMC?

ZioByte
  • 2,690
  • 1
  • 32
  • 68
  • Does this answer your question? [Download large file in python with requests](https://stackoverflow.com/questions/16694907/download-large-file-in-python-with-requests) – python_user Feb 14 '21 at 16:45
  • @python_user: unless I'm missing something the code there is equivalent to what I'm already using. My problem is the "write to disk" part is *slower* than "get from net". In this condition, apparently, requests (or something below) keeps allocating memory to buffer incoming data not yet processed. I need a way to slow down source (e.g.: delay sending frame ACK) – ZioByte Feb 14 '21 at 17:04
  • Also related: https://stackoverflow.com/questions/17691231/how-to-limit-download-rate-of-http-requests-in-requests-python-library – fuenfundachtzig Feb 14 '21 at 18:21

3 Answers3

1
                if r.status_code != 200:
                    log.error(f'??? download returned {r.status_code}')
                    return -(offset + r.status_code)
                siz = 0
                with open(sfn, 'wb') as fo:
                    for chunk in r.iter_content(chunk_size=4096):
                        fo.write(chunk)
                        siz += len(chunk)
                return siz

You can rewrite it as a coroutine

import requests

def producer(URL,temp_data,n):
    with requests.get(URL,  stream=True) as r:
        if r.status_code != 200:
            log.error(f'??? download returned {r.status_code}')
            return -(offset + r.status_code)
        for chunk in r.iter_content(chunk_size=n):
            temp_data.append(chunk)
            yield #waiting to finish the consumer
            

def consumer(temp_data,fname):
    with open(fname, 'wb') as fo:
        while True:
            while len(temp_data) > 0:
                for data in temp_data:
                    fo.write(data)
                    temp_data.remove(data) # To remove it from the list
                    # You can add sleep here
                    yield #waiting for more data


def coordinator(URL,fname,n=4096):
    temp_data = list()
    c = consumer(temp_data,fname)
    p = producer(URL,temp_data,n)
    while True:
        try:
            #getting data
            next(p)
        except StopIteration:
            break
        finally:
            #writing data
            next(c)

These are all the functions you need. To call this

URL = "URL"
fname = 'filename'
coordinator(URL,fname)
SaGaR
  • 534
  • 4
  • 11
  • Thanks. I will surely try this, but I fail to understand how it is going to slow down process. My write operations are already too slow as is, how can it help to slow them even more? Apparently requests will *not* wait for me to poll r.iter_content() to fill it. I need to insert a "sleep call" in requests code, somewhere (IFF i get you right). – ZioByte Feb 14 '21 at 17:17
  • Yes You can add sleep call before calling the next(obj) in for loop. Wait for a few moments i am re writing it – SaGaR Feb 14 '21 at 17:29
  • @ZioByte Its done now you can use it.It may also solve your memory problem as it is writing chunks one by one – SaGaR Feb 14 '21 at 17:55
  • @e3n But if you sleep() won't the OS just buffer received packets somewhere in memory anyway, keeping the memory problem? – Anonymous1847 Feb 14 '21 at 18:33
  • @Anonymous1847 I dont have much experience with file handling. But here we are yielding the ```chunk``` and consuming it manually so i dont think that there should be any memory related problem. And the sleep is optional. i dont recommend it but maybe @Ziobyte device is slow so he may need it – SaGaR Feb 14 '21 at 18:41
1

If the web server supports the http Range field, you can request a download of only part of the large file and then step through the entire file part by part.

Take a look at this question, where James Mills gives the following example code:

from requests import get

url = "http://download.thinkbroadband.com/5MB.zip"
headers = {"Range": "bytes=0-100"}  # first 100 bytes

r = get(url, headers=headers)

As your problem is memory, you will want to stop the server from sending you the whole file at once, as this will certainly be buffered by some code on your device. Unless you can make requests drop part of the data it receives, this will always be a problem. Additional buffers downstream of requests will not help.

fuenfundachtzig
  • 7,952
  • 13
  • 62
  • 87
  • Nice try (I upvoted), but my server supports neither Range nor Restart :( – ZioByte Feb 15 '21 at 11:01
  • 1
    That's unfortunate. If you just want to download the file to disk, maybe consider not using `requests` but an external tool like `curl` that you could call via `os.system` (or `subprocess` etc.) `curl` supports bandwidth limits, cf. https://unix.stackexchange.com/questions/39218 – fuenfundachtzig Feb 15 '21 at 11:44
1

You can try decreasing the size of the TCP receive buffer with this bash command:

echo 'net.core.rmem_max=1000000' >> /etc/sysctl.conf

(1 MB, you can tune this)

This stops there being a huge buffer build-up at this stage of the process.

Then write code to only read from the TCP stack and write to the MMC at specified intervals to prevent buffers from building up elsewhere in the system, such as the MMC write buffer -- for example @e3n's answer.

Hopefully this should cause packets to be dropped and then re-sent by the server once the buffer opens up again.

Anonymous1847
  • 2,568
  • 10
  • 16
  • Uhmm... interesting, My current value (no `/etc/sysctl.conf`at all) is very low: `net.core.rmem_max = 180224`. I suppose something is doing the buffering somewhere above basic TCP and below my application level. – ZioByte Feb 15 '21 at 11:14
  • @ZioByte It could be the python library is buffering something. Perhaps there's an option to reduce buffer size? Or you could try a different library. – Anonymous1847 Feb 15 '21 at 19:23