4

I am writing a code for baseline correction of multiple signals. The structure of the code is like this.

# for each file in a directory
    #read file and populate X vector
    temp = baseline_als(x,1000,0.00001)
    plt.plot(x-temp)
    plt.savefig("newbaseline.png")
    plt.close()

The baseline_als function is as below.

def baseline_als(y, lam, p, niter=20):
        L = len(y)
        D = sparse.csc_matrix(np.diff(np.eye(L), 2))
        w = np.ones(L)
        for i in xrange(niter):
            W = sparse.spdiags(w, 0, L, L)
            Z = W + lam * D.dot(D.transpose())
            z = spsolve(Z, w*y)
            w = p * (y > z) + (1-p) * (y < z)
        return z

Now when I put around 100 files in a directory, the code works fine, although it takes time since the complexity is quite high. But when I have around 10000 files in my directory and then I run this script, the system freezes after few minutes. I don't mind a delay in execution, but is there anyway that the script should finish execution?

Riken Shah
  • 3,022
  • 5
  • 29
  • 56
  • Have you run any sort of system monitor when the code "freezes"? – cdarke Jul 01 '16 at 07:03
  • I am unsure how can I run a system monitor. Since mouse and keyboard becomes unresponsive and I have to reboot. – Riken Shah Jul 01 '16 at 07:33
  • 1
    You don't say which operating system you use. Start the monitor before you start your program. If you have to reboot then something else might be happening. Have you shown your whole code? – cdarke Jul 01 '16 at 07:44
  • I am using ubuntu 14.04. Yes whole code except the file reading part. Ok, I will try with the system monitor started before executing now. – Riken Shah Jul 01 '16 at 07:45
  • With single core ? No ! Without threading ? No! Your processor is alive ? – dsgdfg Jul 01 '16 at 08:00

2 Answers2

1

I was able to prevent my CPU from reaching 100% and then getting freezes by using time.sleep(0.02). It takes a long time but completes execution nonetheless.

Note that you need to import time before using this.

Riken Shah
  • 3,022
  • 5
  • 29
  • 56
1

in the script is consumed too much RAM when you run it over a too large number of files, see Why does a simple python script crash my system

The process in that your program runs stores the arrays and variables for the calculations in process memory which is ram and there they accumulate

A possible workaround is to run the baseline_als() function in a child process. When the child returns the memory is freed automatically, see Releasing memory in Python

execute function in child process:

from multiprocessing import Process, Queue

def my_function(q, x):
 q.put(x + 100)

if __name__ == '__main__':
 queue = Queue()
 p = Process(target=my_function, args=(queue, 1))
 p.start()
 p.join() # this blocks until the process terminates
 result = queue.get()
 print result

copied from: Is it possible to run function in a subprocess without threading or writing a separate file/script

by this you prevent that ram is consumed by unreferenced old variables that your process (program) produces

another possibility is maybe to invoke of the garbage collector gc.collect() however this is not recommended (not working in some cases)

More useful links:

memory usage, how to free memory

Python large variable RAM usage

I need to free up RAM by storing a Python dictionary on the hard drive, not in RAM. Is it possible?

Community
  • 1
  • 1
ralf htp
  • 9,149
  • 4
  • 22
  • 34