How to write a large binary file from the internet in python 3 without reading the entire file to memory?

Question

This answer provides a very helpful way to download a file from the internet using Python 3.

Essentially it says to use:

with urllib.request.urlopen(url) as response, open(file_name, 'wb') as out_file:
    shutil.copyfileobj(response, out_file)

if the url specifies a huge file, isn't the response automatically stored in memory? I.e. even though copyfileobj buffers the file, doesn't just making the request return the entire large file as response?

"doesn't just making the request return the entire large file as `response`" By the way, you could ask a similar question "doesn't calling `with open("bigfile.txt", "r") as response` read the entire large file into memory as `response`?". The answer to that question is also "no", and for reasons that in some respects are similar to the case for a response object from `urllib` and in other respects different. It's worth looking in more detail into how different types of I/O work. — Steve Jessop, Jul 25 '15 at 21:15

JuniorCompressor · Accepted Answer · 2015-07-25T21:15:05.877

1

No, urlopen will return a file like object over a socket. Quoting:

Open a network object denoted by a URL for reading. If the URL does not have a scheme identifier, or if it has file: as its scheme identifier, this opens a local file (without universal newlines); otherwise it opens a socket to a server somewhere on the network. If the connection cannot be made the IOError exception is raised. If all went well, a file-like object is returned. This supports the following methods: read(), readline(), readlines(), fileno(), close(), info(), getcode() and geturl().

So since seek method is not supported either by urlopen but not also used by copyfileobj we can deduce that there is no need to store all the content in memory.

edited Jul 25 '15 at 21:15

answered Jul 25 '15 at 21:09

JuniorCompressor

19,631
4
30
57

So `seek` can only be used on objects in memory and if it is not available than the object is not in memory? As @SteveJessop mentioned in his comment to my question, I could seek around `with open("bigfile.txt", "r")` without reading it all into memory no? – Startec Jul 25 '15 at 21:28
Seeking in real files doesn't read them all in memory. What I meant is without seek a file like object is used for streaming. And when you stream content there is no good reason to store it all in memory, just buffer it. – JuniorCompressor Jul 25 '15 at 21:35

How to write a large binary file from the internet in python 3 without reading the entire file to memory?

1 Answers1