Python code become slower after each iteration

Question

I have following code which is supposed to do some operation over a vector of data and store the result, my problem is that when I run this code at first each iteration (each outter loop) takes about 12 sec but after some time iteration's time become longer at the end each iteration takes 2 minutes to become complete, I wanna know what's wrong with my code? is this related to the size of memory and the size of array I am keeping? if yes how can I solve it? if no what's the problem?

allclassifiers is 2D vectors: allclassifiers.shape = tuple: (1020, 1629) which means my for should loop 1629 times and each time do the operation on a vectors of size 1020, each vectors is array 0 or 1s.

At the end the size of points_comb is around 400MB, points_comb.shape = tuple: (26538039, 2), and I must mention I am running the code on a system with 4GB of RAM.
valid_list is a vector of 0 and 1 with size of 1020, valid_list.shape tuple: (1020,)

startTime = datetime.now()

for i in range(allclassifiers.shape[1]):
    for j in range(allclassifiers.shape[1]):
        rs_t = combine_crisp(valid_list, allclassifiers[:,i], allclassifiers[:,j], boolean_function)
        fpr_tmp, tpr_tmp = resp2pts(valid_list, rs_t)
        points_comb = np.vstack((points_comb,np.hstack((fpr_tmp, tpr_tmp))))
    endTime = datetime.now()
    executionTime = endTime - startTime
    print "Combination for classifier: " + str(i)+ ' ' + str(executionTime.seconds / 60) + " minutes and " + str((executionTime.seconds) - ((executionTime.seconds / 60)*60)) +" seconds"


def combine_crisp(lab, r_1, r_2, fun):

    rs = np.empty([len(lab), len(fun)])
    k = 0
    for b in fun:
        if b == 1:            #----------------> 'A AND B'
            r12 = np.logical_and(r_1, r_2)
        elif b == 2:            #----------------> 'NOT A AND B'
            r12 = np.logical_not(np.logical_and(r_1, r_2))
        elif b == 3:            #----------------> 'A AND NOT B'
            r12 = np.logical_and(r_1, np.logical_not (r_2))
        elif b == 4:            #----------------> 'A NAND B'
            r12 = np.logical_not( (np.logical_and(r_1, r_2)))
        elif b == 5:            #----------------> 'A OR B'
            r12 = np.logical_or(r_1, r_2)
        elif b == 6:            #----------------> 'NOT A OR B'; 'A IMP B'
            r12 = np.logical_not (np.logical_or(r_1, r_2))
        elif b == 7:            #----------------> 'A OR NOT B' ;'B IMP A'
            r12 = np.logical_or(r_1, np.logical_not (r_2))
        elif b == 8:            #----------------> 'A NOR B'
            r12 = np.logical_not( (np.logical_or(r_1, r_2)))
        elif b == 9:            #----------------> 'A XOR B'
            r12 = np.logical_xor(r_1, r_2)
        elif b == 10:            #----------------> 'A EQV B'
            r12 = np.logical_not (np.logical_xor(r_1, r_2))
        else:
            print('Unknown Boolean function')

        rs[:, k] = r12
        k = k + 1


    return rs

def resp2pts(lab, resp):
    lab = lab > 0
    resp = resp > 0
    P = sum(lab)
    N = sum(~lab)
    if resp.ndim == 1:
        num_pts = 1
        tp = sum(lab[resp])
        fp = sum(~lab[resp])
    else:
        num_pts = resp.shape[1]
        tp = np.empty([num_pts,1])
        fp = np.empty([num_pts,1])
        for i in np.arange(num_pts):
            tp[i] = np.sum( lab[resp[:,i]]) 
            fp[i] = np.sum( ~lab[resp[:,i]])
    tp = np.true_divide(tp,P)
    fp = np.true_divide(fp,N)
    return fp, tp

Do you see the process taking up an increasing amount of RAM? — mdurant, Jul 24 '14 at 14:11
Actually I used "watch -n 5 free -m" to check memory but it showed from the beginning that I have only 275 MB free ram — Am1rr3zA, Jul 24 '14 at 14:19
Can you post code in a way we can run it? I manage to simulate `allclassifiers` but what is `valid_list`? There are too many symbols undefined in your code example, if you want people spend time trying to help please provide a running example. — Raydel Miranda, Jul 24 '14 at 14:43
`valid_list` is a vector of 0 and 1 with size of 1020, `valid_list.shape tuple: (1020,)`, really it was hard to give you input but you can generate random vectors of 1 and 0s — Am1rr3zA, Jul 24 '14 at 14:47
And what is `np`? I suppose it has to do with `import numpy as np` right? — Raydel Miranda, Jul 24 '14 at 14:51
The `np.vstack` is what kills performance, I guarantee it. It allocates an increasingly large array *every* iteration. — , Jul 24 '14 at 17:04

score 2 · Answer 1 · edited May 23 '17 at 12:33

I will recommend you to do the first thing one has to do when facing a problem like this.

Use a profiler.

Using a profiler you can find out (or at least get closer) where the problem is, what functions are the cause, and how is the application using the memory.

So, you can start with The python Profilers and there is a very interesting question here on Which Python memory profiler is recommended?

In may opinion a very, very good start is this guide: A guide to analyzing Python performance

One of the things you can learn is to find out the ammount of memory used per code line!!!

enter image description here

Good luck!

score 2 · Accepted Answer · answered Jul 24 '14 at 17:40

I haven't actually tested but I'm pretty sure the slowdown is caused by the repeated use of np.vstack. You can't really append to a Numpy array, so if you want to increase the size you have to allocate a new array first and then copy the old and new data into it. This process takes time:

A = np.random.rand(1e7)
%timeit A.copy()
10 loops, best of 3: 119 ms per loop

Your points_comb increases in size, so it takes increasingly longer to make a copy of it.

The solution is to append to arrays sparingly or use a Python list (if you don't know the size of the result beforehand).

So instead of:

result = np.empty(shape=(0, N))

for i in some_iterable:
    for j in range(X):
        temp = somefunction(i,j)
        result = np.vstack(result, temp)

You could do for example:

result = list()

for i in some_iterable:
    for j in range(X):
        temp = somefunction(i,j)
        result.append(temp)

result_np = np.vstack(result)

The list will presumably take a lot of memory though. If you do know the size of the result before the loop you could preallocate the array and copy the content to it as it becomes available:

result = np.empty(shape=(X**2, N))

for i in range(X):
    for j in range(X):
        temp = somefunction(i,j)
        result[i*j, :] = temp

Now you're also copying, but only small chunks so this is reasonably fast. The best thing though would be somehow vectorizing your work (I'm not saying that it's even possible). Then you could say the loops goodbye and do something like:

result = some_vectorized_function(X)

I will test it, Thank you for your suggestion, Yet I am not sure how can I vectorizing my code as you said i am also not sure it's possible or not. — Am1rr3zA, Jul 26 '14 at 00:45

Python code become slower after each iteration

2 Answers2

Use a profiler.