Python multiprocessing numpy.linalg.pinv cause segfault

Question

I wrote a function using multiprocessing packages from python and tried to boost the speed of my code.

from arch.univariate import ARX, GARCH
from multiprocessing import Process
import multiprocessing
import time

def batch_learning(X, lag_array=None):
    """
    X is a time series array
    lag_array contains all possible lag numbers
    """
    # init a queue used for triggering different processes
    queue = multiprocessing.JoinableQueue()
    data = multiprocessing.Queue()

    # a worker called ARX_fit triggered by queue.get()
    def ARX_fit(queue):
        while True:
            q = queue.get()
            q.volatility = GARCH()
            print "Starting to fit lags %s" %str(q.lags.size/2)
            try:
                q_res=q.fit(update_freq=500)
            except:
                print "Error:...."
            print "finished lags %s" %str(q.lags.size/2)
            queue.task_done()
    # init four processes
    for i in range(4):
        process_i = Process(target=ARX_fit, name="Process_%s"%str(i),   args=(queue,))
        process_i.start()
    # put ARX model objects into queue continuously
    for num in lag_array:
        queue.put(ARX(X, lags=num))

    # sync processes here
    queue.join()   

    return

After calling function:

batch_learning(a, lag_array=range(1,10))

However it got stuck in the middle and I got the print out messages as below:

Starting to fit lags 1
Starting to fit lags 3
Starting to fit lags 2
Starting to fit lags 4
finished lags 1
finished lags 2
Starting to fit lags 5
finished lags 3
Starting to fit lags 6
Starting to fit lags 7
finished lags 4
Starting to fit lags 8
finished lags 6
finished lags 5
Starting to fit lags 9

It runs forever but without any printouts on my Mac OS El Captain. Then using PyCharm debug mode and thanks for Tim Peters suggestions, I successfully find out that the processes actually quitted unexpectedly. Under debug mode, I can pinpoint it is actually svd function inside numpy.linalg.pinv() used by arch library causing this problem. Then my question is: Why? It works with single process for-loop but it cannot work with 2 processes or above. I don't know how to fix this problem. Is it a numpy bug? Can anyone help me a bit here?

Strange thing is if I remove the `try: q_res=q.fit(update_freq=500) except: print "Error:...."`. it works properly. I guess there is something wrong with the fit function? — Gauss Lee, Aug 08 '16 at 20:31
What platform/operating system are you using? The OS X Accelerate framework has issues with multiprocessing that manifest in a similar way. — aganders3, Aug 09 '16 at 21:50
@aganders3: I am using Mac OS El Captain. Do you know how to solve this issue? — Gauss Lee, Aug 09 '16 at 21:51
As far as I know, there is no resolution but there are some workarounds. This is a major frustration for our lab. See this question for further explanation: http://stackoverflow.com/questions/9879371/segfault-using-numpys-lapack-lite-with-multiprocessing-on-osx-not-linux — aganders3, Aug 10 '16 at 21:05
Suggested this in a comment on my "answer": try using Python 3 (3.4 or later) with the `multiprocessing` `spawn` start method. Or any version of Python on Windows. Those take `fork()` out of the equation. Python itself endures a universe of pain to make `threading.Thread` threads play nice with `fork()`, but can't do anything to make other software's threads sane. — Tim Peters, Aug 10 '16 at 22:17

score 3 · Accepted Answer · edited May 23 '17 at 12:08

I have to answer this question by myself and providing my solutions. I have already solved this issue, thanks to the help from @Tim Peters and @aganders.

The multiprocessing usually hangs when you use numpy/scipy libraries on Mac OS because of the Accelerate Framework used in Apple OS which is a replacement for OpenBlas numpy is built on. Simply, in order to solve the similar problem, you have to do as follows:

uninstall numpy and scipy (scipy needs to be matched with proper version of numpy)
follow the procedure on this link to rebuild numpy with Openblas.
reinstall scipy and test your code to see if it works.

Some heads up for testing your multiprocessing codes on Mac OS, when you run your code, it is better to set up a env variable to run your code:

OPENBLAS_NUM_THREADS=1 python import_test.py

The reason for doing this is that OpenBlas by default create 2 threads for each core to run, in which case there are 8 threads running (2 for each core) even though you set up 4 processes. This creates a bit overhead for the thread switching. I tested OPENBLAS_NUM_THREADS=1 config to limit 1 thread each process on each core, it is indeed faster than default settings.

score 1 · Answer 2 · answered Aug 09 '16 at 01:39

1

There's not much to go on here, and the code indentation is wrong so it's hard to guess what you're really doing. To the extent I can guess, what you're seeing could happen if the OS killed a process in a way that didn't raise a Python exception.

One thing to try: first make a list, ps, of your four process_i objects. Then before queue.join() add:

while ps:
    new_ps = []
    for p in ps:
        if p.is_alive():
            new_ps.append(p)
        else:
            print("*********", p.name, "exited with", p.exitcode)
    ps = new_ps
    time.sleep(1)

So about once per second, this just runs through the list of worker processes to see whether any have (unexpectedly!) died. If one (or more) has, it displays the process name (which you supplied already) and the process exit code (as given by your OS). If that triggers, it would be a big clue.

If none die, then we have to wonder whether

q_res=q.fit(update_freq=500)

"simply" takes a very long time for some q states.

answered Aug 09 '16 at 01:39

Tim Peters

67,464
13
126
132

Sorry for the indentation problem. I fixed it now. It should be readable. What I am trying to do is just create a multiprocessing function to train a arch model for a lot of times with different parameters. `q.fit()` function is the one I need many processes to handle in parallel. It got stuck after finishing 6 tasks. I do not know why. You can see this from the printouts. It got stuck when lags=7 and it never continue to finish the task when lags >= 7 – Gauss Lee Aug 09 '16 at 07:35
Hi, Tim. Thanks for your replies. I tried to add your snippet into my code and processes got unexpectedly killed. The output is `('*********', 'Process_0', 'exited with', 1) ('*********', 'Process_1', 'exited with', -11) ('*********', 'Process_2', 'exited with', -11) ('*********', 'Process_3', 'exited with', -11)`. I am not very advanced about multiprocessing programming. I wonder what is the reason for the processes to be killed. – Gauss Lee Aug 09 '16 at 07:44
Some extra comments on this ticket. I tried the code in Pycharm and it tells me `python quits unexpected with _u_math_linalg.so crash (segfault)`. Then I start to debug where the error is located. I found the problem is quite deep. It is then caused by `q` which is a ARX object from arch library built based on numpy. Even more deeply a numpy function `svd(a, 0)` inside np.linalg.pinv function is identified. it seems when it reaches lag num = 8, svd crashed causing process to quit. I tracked the `a inside svd(a, 0)`. It seems a inappropriate `a` causing the crash. – Gauss Lee Aug 09 '16 at 15:56
However, I still could not figure out why a single for loop can run it with no problem but using multiprocessing causes a crash. There is a doubt that np.linalg.pinv is not thread safe. I have seen some multiprocessing crashes caused by related np.linalg functions. Is this another bug? HELP PLEASE! – Gauss Lee Aug 09 '16 at 16:01
Yes, "-11" on Linux-y systems means the process was killed by a segfault. The exit status "1" has no _generally_ applicable meaning - depends on the precise software you're running. I have no experience with `arch`. – Tim Peters Aug 09 '16 at 16:01
Yes. Thanks for your reply. I am still struggling here. :-) – Gauss Lee Aug 09 '16 at 16:04
Added a `numpy` tag, because it sounds like you really need an expert on that. Suggest adding what you learned to the body of the question, and changing the title to reflect that (i.e., you're no longer wondering why it's "stuck", you're wondering why it's segfaulting). – Tim Peters Aug 09 '16 at 16:12
Is it possible to try this on Windows? I'm betting it would work then (due to `multiprocessing` creating an entirely new process on Windows, but using `fork()` on Linux-y systems - mixing `fork()` with threads can be disastrous, and while you don't show any threads of your own here, some of the things `numpy` may invoke have threads of their own). – Tim Peters Aug 09 '16 at 18:24
Or if you can't try Windows, how about getting a recent version of Python 3 (3.4 or later) and trying the `spawn` start method? That's essentially how things always work on Windows. https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods – Tim Peters Aug 10 '16 at 00:14

Python multiprocessing numpy.linalg.pinv cause segfault

2 Answers2