17

I've been running Python scripts that make several calls to some functions, say F1(x) and F2(x), that look a bit like this:

x = LoadData()

for j in range(N):
    y = F1(x[j])
    z[j] = F2(y)

    del y

SaveData(z)

Performance is a lot faster if I keep the "del y" line. But I don't understand why this is true. If I don't use "del y", then I quickly run out of RAM and have to resort to virtual memory, and everything slows to a crawl. Buy if I use "del y", then I am repeatedly flushing and re-allocating the memory for y. What I would like to do is have y sit as static memory, and reuse the memory on every F1(x) call. But from what I can tell, that isn't what's happening.

Also, not sure if it's relevant, but my data consists of numpy arrays.

marshall.ward
  • 6,758
  • 8
  • 35
  • 50
  • This question probably assumes that y is first declared inside the loop, I should have mentioned this. Feel free to comment on this aspect of the question though! – marshall.ward Jul 22 '10 at 05:11
  • 3
    If you don't like the explicit `del y` then extract the loop body into a function. That way any variables scoped entirely inside the loop will be cleaned up automatically for each iteration. – Duncan Jul 22 '10 at 08:12
  • @Duncan I have done the same but it still has the same behavior. Am I doing something wrong or is it normal behavior? – HKay Nov 23 '21 at 06:24

3 Answers3

17

Without the del y you might need twice as much memory. This is because for each pass through the loop, y is bound to the previous value of F1 while the next one is calculated.

once F1 returns y is rebound to that new value and the old F1 result can be released.

This would mean that the object returned by F1 occupies quite a lot of memory

Unrolling the loop for the first couple of iterations would look like this

y = F1(x[0])   # F1(x[0]) is calculated, then y is bound to it
z[j] = F2(y)
y = F1(x[1])   # y is still bound to F1(x[0]) while F1(x[1]) is computed
               # The memory for F1(X[0]) is finally freed when y is rebound
z[j] = F2(y)

using del y is a good solution if this is what is happening in your case.

John La Rooy
  • 295,403
  • 53
  • 369
  • 502
  • hear hear -- this totally explains why you have better performance with `del y` – Igor Serebryany Jul 22 '10 at 06:56
  • 1
    Thanks, this must be exactly what is happening: Without the del, my memory usage doubles, spills a bit into virtual memory (since my RAM happens to lie between one and two instances of y) and the script slogs on at low performance. I'll stick with the del solution for now; thank you again for explaining when/how the instances of y are created. – marshall.ward Jul 22 '10 at 07:19
2

what you actually want is something that's weird to do in python -- you want to allocate a region of memory for y and pass the pointer to that region to F1() so it can use that region to build up the next value of y. this avoid having F1() do it's own allocation for the new value of y, the reference to which is then written into your own variable y (which is actually not the value of whatever F1() calculated but a reference to it)

There's already an SO question about passing by reference in python: How do I pass a variable by reference?

Community
  • 1
  • 1
Igor Serebryany
  • 3,307
  • 3
  • 29
  • 41
0

For very large values of N use xrange instead of range for memory save. Also you can nest functions but I don't know if this will help you. : \

x = LoadData()

for j in xrange(N):
    z[j] = F2(F1(x[j]))

SaveData(z)

Maybe F1 and F2 are making unnecessary copies of objects, the best way would be in-place, something like:

x = LoadData()
for item in x:
    item.F1()
    item.F2()
SaveData(x)

Sorry if may answer is not helpful

razpeitia
  • 1,947
  • 4
  • 16
  • 36
  • I worried that my example would permit nesting; it isn't as practical an option in the actual script. I think gnibbler's feedback correctly describes the situation, but thanks for your feedback. I was not familiar with the xrange function until you pointed it out. – marshall.ward Jul 22 '10 at 07:23