I wrote a function using multiprocessing packages from python and tried to boost the speed of my code.
from arch.univariate import ARX, GARCH
from multiprocessing import Process
import multiprocessing
import time
def batch_learning(X, lag_array=None):
"""
X is a time series array
lag_array contains all possible lag numbers
"""
# init a queue used for triggering different processes
queue = multiprocessing.JoinableQueue()
data = multiprocessing.Queue()
# a worker called ARX_fit triggered by queue.get()
def ARX_fit(queue):
while True:
q = queue.get()
q.volatility = GARCH()
print "Starting to fit lags %s" %str(q.lags.size/2)
try:
q_res=q.fit(update_freq=500)
except:
print "Error:...."
print "finished lags %s" %str(q.lags.size/2)
queue.task_done()
# init four processes
for i in range(4):
process_i = Process(target=ARX_fit, name="Process_%s"%str(i), args=(queue,))
process_i.start()
# put ARX model objects into queue continuously
for num in lag_array:
queue.put(ARX(X, lags=num))
# sync processes here
queue.join()
return
After calling function:
batch_learning(a, lag_array=range(1,10))
However it got stuck in the middle and I got the print out messages as below:
Starting to fit lags 1
Starting to fit lags 3
Starting to fit lags 2
Starting to fit lags 4
finished lags 1
finished lags 2
Starting to fit lags 5
finished lags 3
Starting to fit lags 6
Starting to fit lags 7
finished lags 4
Starting to fit lags 8
finished lags 6
finished lags 5
Starting to fit lags 9
It runs forever but without any printouts on my Mac OS El Captain. Then using PyCharm debug mode and thanks for Tim Peters suggestions, I successfully find out that the processes actually quitted unexpectedly. Under debug mode, I can pinpoint it is actually svd
function inside numpy.linalg.pinv() used by arch library causing this problem. Then my question is: Why? It works with single process for-loop but it cannot work with 2 processes or above. I don't know how to fix this problem. Is it a numpy bug? Can anyone help me a bit here?