1

I have the following general setup for Python 2 and 3 support for downloading an ~8MiB binary payload:

import six

if six.PY3:
    from urllib.request import Request, urlopen
else:
    from urllib2 import Request, urlopen

def request(url, method='GET'):
    r = Request(url)
    r.get_method = lambda: method

    response = urlopen(r)

    return response

def download_payload():
    with open('output.bin', 'w') as f:
        f.write(request(URL).read())

I have the following constraints:

  • It must work on Python 2 and 3
  • It must have little to no dependencies whatsoever, as it'll run as an Ansible module on various distributions, Ubuntu, RHEL, Fedora, Debian, etc.

I'd like to minimize the memory usage here, but I'm not seeing any documentation on how urllib works internally; does it just always read the response into memory, or can I do manual buffering on my end to keep the memory usage fixed at my buffer size?

I was thinking of doing something like this:

def download_payload():
    with open('output.bin', 'w') as f:
        r = request(URL)
        hunk = r.read(8192)
        while len(hunk) > 0:
            f.write(hunk)
            hunk = r.read(8192)

The question I'm running into is whether urllib allows me to buffer reads like this to manually manage the memory. Are there any guarantees on it doing this? I can't find mentions of memory usage or buffering in the docs.

Naftuli Kay
  • 87,710
  • 93
  • 269
  • 411
  • Urllib will allow you to stream directly to file, see https://stackoverflow.com/a/1517728/1730895 I also want to add - 8MiB isn't a huge amount of memory, unless you're embedded of course. – Kingsley Dec 10 '18 at 21:31
  • Perfect! Yeah, I know 8MiB isn't a lot, but the size of the binary is outside of my control, so I'd like to make constant memory usage if possible. – Naftuli Kay Dec 10 '18 at 21:39
  • Possible duplicate of [Stream large binary files with urllib2 to file](https://stackoverflow.com/questions/1517616/stream-large-binary-files-with-urllib2-to-file) – Naftuli Kay Dec 10 '18 at 22:04

0 Answers0