How to release memory when python code is still in running?

Question

Here is my code.py.

import numpy as np
import gc

def main():
    var_1, var_2, var_3 = np.random.normal(0, 1, (1, 3))[0]
    var_4, var_5, var_6 = np.random.normal(0, 1, (1, 3))[0]
    var_7, var_8, var_9 = np.random.normal(0, 1, (1, 3))[0]
    var_10, var_11, var_12 = np.random.normal(0, 1, (1, 3))[0]

    List = [var_1, var_2, var_3, var_4, var_5, var_6, var_7, var_8, var_9, var_10, var_11, var_12]
    with open('record.csv','a') as f: 
        for i in List:
            f.write('{},'.format(str(i)))
        f.write('\n')

    del var_1, var_2, var_3, var_4, var_5, var_6, var_7, var_8, var_9, var_10, var_11, var_12
    del f, List
    gc.collect()

# This code is just for demonstration. In actual 
# situation, `data` is necessary for main(). So don't use `del data`.
data = np.random.normal(0, 1, (1000, 3))

total = 100*100*100
for k in range(total):
    print(k+1, total)
    main()

Theoretically, the code above should only use a fixed number of memory since I've deleted all variables and cleared all garbage. However, when I ran it by python code.py in one terminal and observed memory usage via htop in another terminal, the memory usage continuously increases from 1.79G/7.76G to 1.80G/7.76G, then to 1.81G/7.76G and so on utill the for-loop is over.

How can I modify the code to make it keep running without continuously consuming more memory?

Using `del` is completely pointless here, as is `gc`. Those are all local variables, and there are no reference cycles for `gc` to handle. — juanpa.arrivillaga, Aug 01 '19 at 03:45

score 0 · Answer 1 · answered Aug 02 '19 at 18:27

GC isn't aware of your intention. OS keeps allocating memory for the Python process, so why bother cleaning the garbage? You need to limit Python process somehow to let it know what amount of memory is available.

It doesn't look like Python provides way to limit the heap size. So use or OS features to limit process memory footprint (i.e. ulimit for linux and macos). See this question for more details.

shortorian · Answer 2 · 2019-08-02T19:41:04.177

As commenters stated, del and gc are unnecessary because of the way Python scopes your variables. I've also seen a print statement like that eat a lot of system resources on some machines. I'm not keeping your function structure, but here are two solutions. If you don't know how big your output array is going to be, do something like this:

total = 100*100*100
out = []
for i in range(total):
    if i % 1e5 == 0:
        print(i, total)
    var_1, var_2, var_3 = np.random.normal(0, 1, (1, 3))[0]
    var_4, var_5, var_6 = np.random.normal(0, 1, (1, 3))[0]
    var_7, var_8, var_9 = np.random.normal(0, 1, (1, 3))[0]
    var_10, var_11, var_12 = np.random.normal(0, 1, (1, 3))[0]

    out.append([var_1, var_2, var_3, var_4, var_5, var_6, var_7, var_8, var_9, var_10, var_11, var_12])

np.savetxt('SOcheck.dat', out, delimiter=',')

On my windows 10 machine running python 3.6.3 and using iPython, that code never goes above about 400M of memory. You get significant savings (memory usage goes down to about 270M) if you know how big your output array will be and you reserve the memory first, like this:

total = 100*100*100
out = np.empty((total, 12), dtype=np.ndarray)
for i in range(total):
    if i % 1e5 == 0:
        print(i, total)
    var_1, var_2, var_3 = np.random.normal(0, 1, (1, 3))[0]
    var_4, var_5, var_6 = np.random.normal(0, 1, (1, 3))[0]
    var_7, var_8, var_9 = np.random.normal(0, 1, (1, 3))[0]
    var_10, var_11, var_12 = np.random.normal(0, 1, (1, 3))[0]

    out[i] = [var_1, var_2, var_3, var_4, var_5, var_6, var_7, var_8, var_9, var_10, var_11, var_12]

np.savetxt('SOcheck.dat', out, delimiter=',')

How to release memory when python code is still in running?

2 Answers2