I have the following general setup for Python 2 and 3 support for downloading an ~8MiB binary payload:
import six
if six.PY3:
from urllib.request import Request, urlopen
else:
from urllib2 import Request, urlopen
def request(url, method='GET'):
r = Request(url)
r.get_method = lambda: method
response = urlopen(r)
return response
def download_payload():
with open('output.bin', 'w') as f:
f.write(request(URL).read())
I have the following constraints:
- It must work on Python 2 and 3
- It must have little to no dependencies whatsoever, as it'll run as an Ansible module on various distributions, Ubuntu, RHEL, Fedora, Debian, etc.
I'd like to minimize the memory usage here, but I'm not seeing any documentation on how urllib
works internally; does it just always read the response into memory, or can I do manual buffering on my end to keep the memory usage fixed at my buffer size?
I was thinking of doing something like this:
def download_payload():
with open('output.bin', 'w') as f:
r = request(URL)
hunk = r.read(8192)
while len(hunk) > 0:
f.write(hunk)
hunk = r.read(8192)
The question I'm running into is whether urllib
allows me to buffer reads like this to manually manage the memory. Are there any guarantees on it doing this? I can't find mentions of memory usage or buffering in the docs.