0

I am using a Windows machine and I have a code designed for Python 2.7 that solves an statistical model. Since the model depends on the value of a parameter, I created a parallelized version that solves one model for each value of the parameter.

Consider for instance a first file called main_function that includes the following code (this code is here for the sake of replicability but is not question-related):

import numpy as np
import cvxpy

def lm_lasso(x, y, lambda1=None):
    n = x.shape[0]
    m = x.shape[1]
    lambda_param = cvxpy.Parameter(sign="positive")
    # Define the objective function
    beta_var = cvxpy.Variable(m)
    lasso_penalization = lambda_param * cvxpy.norm(beta_var, 1)
    lm_penalization = (1.0 / n) * cvxpy.sum_squares(y - x * beta_var)
    objective = cvxpy.Minimize(lm_penalization + lasso_penalization)
    problem = cvxpy.Problem(objective)
    beta_sol_list = []
    for l in lambda1:
        lambda_param.value = l
        problem.solve(solver=cvxpy.ECOS)
        beta_sol = np.asarray(np.row_stack([b.value for b in beta_var])).flatten()
        beta_sol_list.append(beta_sol)
    return beta_sol_list

And a second file called parallel_function that includes the following code:

import multiprocessing as mp
import numpy as np
import functools
import zz_main_function as mf

def lm_lasso_parallel(x, y, lambda1):
    chunks = np.array_split(lambda1, mp.cpu_count())
    pool = mp.Pool(processes=mp.cpu_count())
    results = pool.map(functools.partial(mf.lm_lasso, x, y), chunks)
    pool.close()
    pool.join()
    return results

The reason why I splitted the functions into two files is because this way everything seemed to work without adding the usual if __name__ == '__main__': required when dealing with multiprocessing.

This code was written some months ago and worked perfectly either from the python console or by runnig a python file like:

import zz_parallel_function as pf
from sklearn.datasets import load_boston

boston = load_boston()
x = boston.data
y = boston.target
lambda1 = [0, 1e-3, 1e-2, 1e-1, 1, 1e2, 1e3]

r_parallel = pf.lm_lasso_parallel(x, y, lambda1)

Recently I had to format my computer and when I reinstalled python 2.7 and trried to run the code described before, I run into the following errors:

  1. If I try to run it directly from python console:

    import zz_parallel_function as pf
    from sklearn.datasets import load_boston
    
    boston = load_boston()
    x = boston.data
    y = boston.target
    lambda1 = [0, 1e-3, 1e-2, 1e-1, 1, 1e2, 1e3]
    
    r_parallel = pf.lm_lasso_parallel(x, y, lambda1)
    

enter image description here

  1. If I run it as an independent file:

enter image description here

So my question is:

  1. Why did this code work before and not now? The only thing that (possibly) changed is the version of some of the modules installed but I dont think this is that relevant

  2. Any guess on how to get it working again?

EDIT 1

By adding if __name__ == '__main__': to the code and running it as an independent file, it executes with no problem. However, when I try to execute it in a python console, it offers the same error as before.

Based on the comments received, this was possibly due to the necessity of frozing the code. The code in the python console is not frozen and this would be the cause of the issue. I then considered running the following example from multiprocessing for windows

from multiprocessing import Process, freeze_support

def foo():
    print 'hello'

if __name__ == '__main__':
    freeze_support()
    p = Process(target=foo)
    p.start()

This code suposedly freezes the code, but when running it in the python console, I get the same error as before.enter image description here

  • `if __name__ == '__main__'` guard was always required when using multiprocessing in windows iirc, because of the way processes are created in `windows` – han solo Sep 23 '19 at 13:45
  • Alvaro, could you do `if __name__ == '__main__':..r_parallel = pf.lm_lasso_parallel(x, y, lambda1)` and similarly in the other file and see ? – han solo Sep 23 '19 at 13:47
  • @hansolo.If I add the `if __name__ == '__main__'` and run it in the python console, I get the same error as the one posted in the question. If I run it as an independent file it seems to work fine. Does this mean that there is no way to run a parallelized code directly from python console? – Álvaro Méndez Civieta Sep 23 '19 at 13:53
  • I went to the multiprocessing guide https://docs.python.org/2/library/multiprocessing.html and tried running the first example there in my python console. Same results. – Álvaro Méndez Civieta Sep 23 '19 at 14:16
  • Alvaro, the code in the console is not frozen. The code should be frozen for running in windows. Let me check, if there is any way to run from console, although i think the chances are slim – han solo Sep 23 '19 at 14:34
  • Alvaro, run the snippet mentioned here [multiprocessing in windows](https://docs.python.org/2/library/multiprocessing.html#windows) not the first one in the page – han solo Sep 23 '19 at 14:36
  • And see the [issue](https://bugs.python.org/issue17674) – han solo Sep 23 '19 at 14:44
  • Possible duplicate of https://stackoverflow.com/questions/15900366/all-example-concurrent-futures-code-is-failing-with-brokenprocesspool – han solo Sep 23 '19 at 14:44
  • @hansolo I have added the steps that I have taken as an edit to the question. Freezing the code as done in the snippet you mentioned did not work. I checked on the possible duplicate question you mentioned, and when I run the code described as the solution I get the same error message as always. `No module named ` – Álvaro Méndez Civieta Sep 23 '19 at 15:14
  • Could you update the question, with the all the code you are using right now ? – han solo Sep 23 '19 at 15:36
  • Alvaro, you are still running the `code` in python interpreter directly ? Like copy pasting ? That won't work, afaik – han solo Sep 23 '19 at 15:42
  • @hansolo, I will update the question in a minute. Answering your last comment, yes, I am using a python interpreter. Specifically python console from Pycharm. As far as I understood, the idea of freezing the code was making it possible to be executed in python console. Am I right? – Álvaro Méndez Civieta Sep 23 '19 at 15:45
  • No, in console you cannot freeze the code. You have to save it in a file and then use the guard – han solo Sep 23 '19 at 15:46
  • oh ok, thank you very much. I guess there is no way of executing the code directly in console then. It still gets me worried that this code worked just fine in a python console only 3 months ago. In any case, thank you very mucho, I realy appreciate your help here. – Álvaro Méndez Civieta Sep 23 '19 at 15:50
  • Alvaro, from the docs, they have clearly mentioned that `Calling freeze_support() has no effect when invoked on any operating system other than Windows. In addition, if the module is being run normally by the Python interpreter on Windows (the program has not been frozen), then freeze_support() has no effect.` – han solo Sep 23 '19 at 15:51
  • No problem. Don't mention it – han solo Sep 23 '19 at 15:52

1 Answers1

1

You cannot spawn new child process(es) using mulitprocessing directly from the python interpreter.

From the docs,

Note: Functionality within this package requires that the main module be importable by the children. This is covered in Programming guidelines however it is worth pointing out here. This means that some examples, such as the Pool examples will not work in the interactive interpreter.

And the guideline says that

Safe importing of main module

Make sure that the main module can be safely imported by a new Python interpreter without causing unintended side effects (such a starting a new process).

Calling freeze_support() has no effect when invoked on any operating system other than Windows. In addition, if the module is being run normally by the Python interpreter on Windows (the program has not been frozen), then freeze_support() has no effect.

Also, One should protect the “entry point” of the program by using if __name__ == '__main__': as follows:

from multiprocessing import Process, freeze_support

def f():
    print 'hello world!'

if __name__ == '__main__':
    freeze_support()
    Process(target=f).start()

If the freeze_support() line is omitted then trying to run the frozen executable(e.g. created using pyinstaller or py2exe) will raise RuntimeError.

han solo
  • 6,390
  • 1
  • 15
  • 19