3

I've got a function which parses a sentence by building up a chart. But Python holds on to whatever memory was allocated during that function call. That is, I do

best = translate(sentence, grammar)

and somehow my memory goes up and stays up. Here is the function:

from string import join
from heapq import nsmallest, heappush
from collections import defaultdict

MAX_TRANSLATIONS=4 # or choose something else

def translate(f, g):
    words = f.split()
    chart = {}
    for col in range(len(words)):
        for row in reversed(range(0,col+1)):
            # get rules for this subspan                                        
            rules = g[join(words[row:col+1], ' ')]
            # ensure there's at least one rule on the diagonal                  
            if not rules and row==col:
                rules=[(0.0, join(words[row:col+1]))]
            # pick up rules below & to the left                                 
            for k in range(row,col):
                if (row,k) and (k+1,col) in chart:
                    for (w1, e1) in chart[row, k]:
                        for (w2, e2) in chart[k+1,col]:
                            heappush(rules, (w1+w2, e1+' '+e2))
            # add all rules to chart                                            
            chart[row,col] = nsmallest(MAX_TRANSLATIONS, rules)
    (w, best) = chart[0, len(words)-1][0]
    return best

g = defaultdict(list)
g['cela'] = [(8.28, 'this'), (11.21, 'it'), (11.57, 'that'), (15.26, 'this ,')]
g['est'] = [(2.69, 'is'), (10.21, 'is ,'), (11.15, 'has'), (11.28, ', is')]
g['difficile'] = [(2.01, 'difficult'), (10.08, 'hard'), (10.19, 'difficult ,'), (10.57, 'a difficult')]

sentence = "cela est difficile"
best = translate(sentence, g)

I'm using Python 2.7 on OS X.

outis
  • 75,655
  • 22
  • 151
  • 221
Dan
  • 763
  • 3
  • 8
  • 13
  • 4
    Which version of Python are you using? The [sample code](http://sscce.org/) isn't complete; there is no free `join` function in standard Python, nor is there an `nsmallest`, nor is any sample data given. This last is particularly important, as `g` could be storing additional data with each run of `translate`, depending on the type of `g`. – outis Mar 24 '12 at 22:56
  • Isn't building the `chart` consuming lot of memory? – Mariusz Jamro Mar 24 '12 at 23:09
  • Secator: yes, building the chart for long sentences takes up a lot of memory since each cell holds partial parses. But after the chart is built, I pick up the data in one cell and return from the function, so anything built up should be garbage collected. – Dan Mar 24 '12 at 23:13
  • outis: cf. edits. also, join and nsmallest are from string and heapq, respectively. – Dan Mar 24 '12 at 23:26
  • 1
    See also: [Python memory profiler](http://stackoverflow.com/q/110259/90527), [How do I profile memory usage in Python?](http://stackoverflow.com/q/552744/90527) – outis Mar 24 '12 at 23:27
  • I did. I've used heapy and indeed what's happening is that a bunch of strings and floats and tuples are being kept in memory after the function call. – Dan Mar 24 '12 at 23:32
  • 1
    Still needs complete sample data in order to be reproducible. – outis Mar 24 '12 at 23:40
  • outis: What do you mean by "complete"? If you build the grammar as pointed out and run the function on the sentence "cela est difficile" you will see the behaviour. Perhaps I'm not understanding something. – Dan Mar 24 '12 at 23:45
  • 1
    Did you read the SSCCE link? By "complete", we mean "something we can copy and paste which runs and shows the problem". I attempted to run what you posted, and chose a value for MAX_TRANSLATIONS, set a sentence, built a grammar dictionary, and heapified the values. The result? "KeyError: 'cela est'". – DSM Mar 24 '12 at 23:51
  • I skimmed it, as I am already familiar with providing runnable code here. Very sorry for the mistake. The grammar should be a defaultdict such that unseen keys return empty lists (i.e. no translations). I've edited the post. – Dan Mar 25 '12 at 00:12
  • What happens if you do not return `best` but, say, `0`, just for testing? You seem to allocate a huge number of small objects; and IIRC under some circumstances their memory can't be freed if some of them (`best` in this case) are still referenced. – Reinstate Monica Mar 25 '12 at 00:17
  • How are you measuring memory usage? IIRC Even if Python frees up space in its own heap, it never returns it to the OS. – ephemient Mar 25 '12 at 00:41
  • WolframH: same thing. =/ ephemient: Activity Monitor in OS X, and through heapy (a Python memory profiler). outis/DSM: were you able to run the sample? – Dan Mar 25 '12 at 01:47
  • The sample still isn't complete. Hold for update. – outis Mar 25 '12 at 01:58
  • outis: what isn't working for you? I tried the sample myself and it is working. – Dan Mar 25 '12 at 02:18
  • 1
    @Dan: it wasn't copy, paste & run, as per SSCCE and [Writing the Perfect Question](http://tinyurl.com/so-hints). – outis Mar 25 '12 at 02:28
  • outis: you're right, very sorry. thanks for the help. – Dan Mar 25 '12 at 02:54
  • @Dan: note that if you want users to be [notified](http://meta.stackexchange.com/q/43019/133817) when you make a comment, prefix their username with "@". – outis Mar 25 '12 at 18:59

2 Answers2

1

Within the function, you set rules to an element of grammar; rules then refers to that element, which is a list. You then add items to rules with heappush, which (as lists are mutable) means grammar holds on to the pushed values via that list. If you don't want this to happen, use copy when assigning rules or deepcopy on the grammar at the start of translate. Note that even if you copy the list to rules, the grammar will record an empty list every time you retrieve an element for a missing key.

outis
  • 75,655
  • 22
  • 151
  • 221
  • There we go. :) Oh me, not reminding myself of the peculiarities of a new language before I start playing around with it... – Dan Mar 25 '12 at 02:22
0

Try running gc.collect after you run the function.

forivall
  • 9,504
  • 2
  • 33
  • 58
  • Hi jordoex. I tried that, both at the end of the function and after running it. No effect. – Dan Mar 25 '12 at 00:14