0

I have the following code:

f = open(file).readlines() # 2GB file
for item in f:
    print f

# some other stuff

g = open(file2).readlines() # 3 GB file
for item in g:
    print g

When is the memory from g freed up? What about for f? If it is not freed up, how would I do that?

David542
  • 104,438
  • 178
  • 489
  • 842
  • 1
    As a side note, if the only reason you need `g` is to iterate over it, why read the whole thing into memory in the first place? A file is already and iterable of lines, just like the list you get back from `readlines`—but with the huge advantage that it's _lazy_. As you loop over it, it reads in one line at a time (well, reads in one smallish buffer at a time), so it will end up needing maybe 8KB instead of 12GB, for an improvement of more than 6 orders of magnitude. – abarnert Sep 18 '14 at 23:07
  • 1
    By "memory", do you mean physical memory, virtual memory, or what? – David Schwartz Sep 18 '14 at 23:09

2 Answers2

3

When is the memory from g freed up?

It depends on what you mean by "freed up".

From Python's point of view, it's not. As long as you have some reference to that list that you could use at any time (which you do, in the variable g), it can't be freed up. Otherwise, what would happen if you later tried to use g? But Python's point of view isn't based on the actual RAM chips sitting in your computer. Your OS lets each program pretend that it has a huge flat chunk of more memory than it could ever need.* Of course that isn't actually true; see below for more on this, but let's stick with Python's view for now.

If you give up all references to that list—e.g., by returning from the current scope, or assigning something else to g, or doing del g (all assuming that's the only reference), then all the memory used for the list can be freed. (In CPython, it will usually happen immediately; in other implementations, usually just "soon".)

But "freed" doesn't mean "returned to the operating system". In general, it will be kept around in a freelist (actually, a few different levels of freelists), on the assumption that if you wanted 3GB now, you'll likely want 3GB again, so Python might as well keep the storage around, because that's faster than re-allocating. (So, if you'd released f, before creating g, then g would take most of its 2GB out of the freelist, and only allocate another 1GB.)


But "not returned to the operating system" doesn't mean "wired to physical memory". And here's where we get to the difference between Python's view and the hardware view. If you've got only, say, 8GB of physical RAM, and 6 programs that each have 12GB of data at the same time, where does the extra 64GB fit? It gets stored on disk, and reloaded from disk the next time you try to use it. Normally it'll do a pretty good job of this.** SO, if you never touch that 3GB again, and some other program on your system needs some memory, your OS will probably page it out of RAM, and never page it back in.


On a related note, you also never close your file objects. That means the file handles are still open until the garbage collector notices that nobody's ever going to use them again. Again, this will usually be immediately in CPython, usually not in other implementations. But don't rely on that. close your files—or, better yet, use a with statement.


* Insert 640K joke here.

** But it's very easy to either maliciously or accidentally prevent it from doing a good job. For example, create a list of 30 billion ints, and keep randomly changing random values, and your OS will spend so much time swapping pages in and out that it can't do anything else…

abarnert
  • 354,177
  • 51
  • 601
  • 671
  • 1
    @DavidSchwartz: Nonsense. It could be _paged to disk_, as the answer explains just a few paragraphs down, but it couldn't be _freed_, because the OS has no way of knowing how to regenerate that information. It's in virtual memory until Python frees it. – abarnert Sep 18 '14 at 23:10
  • 2
    @DavidSchwartz: Why would you assume "memory" means physical RAM? The view of memory that any program has is a big flat memory space; that's the whole point of virtual memory. And if the question is about physical memory, the answer would be obvious: _anything_ gets freed whenever the OS decides someone else needs that RAM more; there's nothing interesting to say about that, or Python-specific. – abarnert Sep 18 '14 at 23:13
  • @DavidSchwartz: Also, what OS would ever eject something active from memory just to free pages? The only reason it would ever do so is to reuse the pages for another mapping. – abarnert Sep 18 '14 at 23:14
  • 1
    @DavidSchwartz: Generally people who ask questions like this have no idea what they're asking. The only thing you can do is try to explain how all of the different things they _could_ be asking differ, and what the answer is for each one. Maybe I could make this clearer with different wording; if you didn't get it, there's a good change the OP won't… let me try an edit. – abarnert Sep 18 '14 at 23:19
  • As a side note, python has other important characteristics regarding [memory allocation](http://stackoverflow.com/questions/15455048/releasing-memory-in-python) that few people are aware of, but that should be taken into account specially when you have to work with big quantities of different objects. – rhlobo Sep 19 '14 at 16:30
  • @rhlobo: I wrote one of the two detailed answers there, and they both got upvotes… but it really isn't useful for almost anyone; the takeaway is that (at least in 64-bit non-embedded platforms) just pretend that there's infinite memory, make sure not to keep references around you don't need, and don't even look at the memory usage unless there's an actual problem, and even then you're probably better off finding a way to use less memory or launching a child process, and even when you can't do that the answer is almost always to explicitly `mmap` a file… – abarnert Sep 19 '14 at 17:37
  • thanks for the tips. That other answer was very useful to me when I was working on a large-scale system that had to continuously process a high volume task queue handling huge amounts of data. [This article about python internal memory management](http://deeplearning.net/software/theano/tutorial/python-memory-management.html#internal-memory-management) describes the problem I had: To speed-up memory allocation (and reuse), python uses a number of lists for small objects that never shrinks. – rhlobo Sep 20 '14 at 15:26
1

Assuming that is the whole program, in both those cases, the memory will be reserved until the program terminates.

You can speed things up a little by adding:

 f = None

after the first loop. This will allow (but not require) that the garbage-collection clean up that first 2Gb.

Much better, though, to adopt a method of processing the files that does not require you to read the whole magilla into memory.

Michael Lorton
  • 43,060
  • 26
  • 103
  • 144