2

I'm trying to identify a memory leak in a Python program I'm working on. I'm current'y running Python 2.7.4 on Mac OS 64bit. I installed heapy to hunt down the problem.

The program involves creating, storing, and reading large database using the shelve module. I am not using the writeback option, which I know can create memory problems.

Heapy usage shows during the program execution, the memory is roughly constant. Yet, my activity monitor shows rapidly increasing memory. Within 15 minutes, the process has consumed all my system memory (16gb), and I start seeing page outs. Any idea why heapy isn't tracking this properly?

steveJ_1987
  • 53
  • 1
  • 4

1 Answers1

0

Take a look at this fine article. You are, most likely, not seeing memory leaks but memory fragmentation. The best workaround I have found is to identify what the output of your large working set operation actually is, load the large dataset in a new process, calculate the output, and then return that output to the original process.

This answer has some great insight and an example, as well. I don't see anything in your question that seems like it would preclude the use of PyPy.

Community
  • 1
  • 1
marr75
  • 5,666
  • 1
  • 27
  • 41
  • The first step of the program is building the database itself. I generate a new entry, store it to the shelf, and then move on to the next. I'm not sure what the "output" of the large set would be here. Should I be freeing the object after it get's pushed onto the shelf? – steveJ_1987 May 17 '13 at 19:32
  • Hard to say without more knowledge of the program. How big is the file you're serializing to? How tightly scoped are the objects you're creating? Why must you use the shelve module? Are there more references to the object after you shelve it? 2 notes: 1) I don't really need the specific answers to any of those questions, but you should consider them, 2) There's no good way to "free" an object. It will or it won't get garbage collected during the next collection based on if the reference counter says it's unreachable. – marr75 May 17 '13 at 19:43
  • The file will be roughly 4gb in size after the db is created. After building the database, I need to perform analysis on the objects one at a time. I decided to use the shelve module to save memory. I checked, and gc.garbage returns an empty list throughout the build up of the program. the del command won't free the memory? – steveJ_1987 May 17 '13 at 19:48
  • Nope. `del` just makes the symbol unreachable in the current scope and decrements the refcount. – marr75 May 17 '13 at 19:54
  • You might try closing/sync-ing the shelf after every item you add, not saying this is a good solution but it can eliminate the shelf as a source of the memory problem. That said, 4GB is a big shelf. Probably time to look at a real database, maybe sqlite? – marr75 May 17 '13 at 20:05
  • I've tried closing/syncing, but that makes the program unbelievably slow. PyPy seemed more promising from what I read, but I can't get shelve to work properly with it. When I import shelve, I get an import error for gbdm. I tried installing this with pip, but it found no formula. Any ideas? – steveJ_1987 May 17 '13 at 23:14
  • 1
    I rewrote the initial code in order to fork the actual analysis of the database, so that each process (using subprocess) only saw part of the database and this worked fine. I think fragmentation was indeed the problem. Thanks for your help. – steveJ_1987 May 18 '13 at 00:56
  • Link needs updating here – David Parks Apr 08 '17 at 19:26