I am using a Windows machine and I have a code designed for Python 2.7 that solves an statistical model. Since the model depends on the value of a parameter, I created a parallelized version that solves one model for each value of the parameter.
Consider for instance a first file called main_function
that includes the following code (this code is here for the sake of replicability but is not question-related):
import numpy as np
import cvxpy
def lm_lasso(x, y, lambda1=None):
n = x.shape[0]
m = x.shape[1]
lambda_param = cvxpy.Parameter(sign="positive")
# Define the objective function
beta_var = cvxpy.Variable(m)
lasso_penalization = lambda_param * cvxpy.norm(beta_var, 1)
lm_penalization = (1.0 / n) * cvxpy.sum_squares(y - x * beta_var)
objective = cvxpy.Minimize(lm_penalization + lasso_penalization)
problem = cvxpy.Problem(objective)
beta_sol_list = []
for l in lambda1:
lambda_param.value = l
problem.solve(solver=cvxpy.ECOS)
beta_sol = np.asarray(np.row_stack([b.value for b in beta_var])).flatten()
beta_sol_list.append(beta_sol)
return beta_sol_list
And a second file called parallel_function
that includes the following code:
import multiprocessing as mp
import numpy as np
import functools
import zz_main_function as mf
def lm_lasso_parallel(x, y, lambda1):
chunks = np.array_split(lambda1, mp.cpu_count())
pool = mp.Pool(processes=mp.cpu_count())
results = pool.map(functools.partial(mf.lm_lasso, x, y), chunks)
pool.close()
pool.join()
return results
The reason why I splitted the functions into two files is because this way everything seemed to work without adding the usual if __name__ == '__main__':
required when dealing with multiprocessing.
This code was written some months ago and worked perfectly either from the python console or by runnig a python file like:
import zz_parallel_function as pf
from sklearn.datasets import load_boston
boston = load_boston()
x = boston.data
y = boston.target
lambda1 = [0, 1e-3, 1e-2, 1e-1, 1, 1e2, 1e3]
r_parallel = pf.lm_lasso_parallel(x, y, lambda1)
Recently I had to format my computer and when I reinstalled python 2.7 and trried to run the code described before, I run into the following errors:
If I try to run it directly from python console:
import zz_parallel_function as pf from sklearn.datasets import load_boston boston = load_boston() x = boston.data y = boston.target lambda1 = [0, 1e-3, 1e-2, 1e-1, 1, 1e2, 1e3] r_parallel = pf.lm_lasso_parallel(x, y, lambda1)
- If I run it as an independent file:
So my question is:
Why did this code work before and not now? The only thing that (possibly) changed is the version of some of the modules installed but I dont think this is that relevant
Any guess on how to get it working again?
EDIT 1
By adding if __name__ == '__main__':
to the code and running it as an independent file, it executes with no problem. However, when I try to execute it in a python console, it offers the same error as before.
Based on the comments received, this was possibly due to the necessity of frozing the code. The code in the python console is not frozen and this would be the cause of the issue. I then considered running the following example from multiprocessing for windows
from multiprocessing import Process, freeze_support
def foo():
print 'hello'
if __name__ == '__main__':
freeze_support()
p = Process(target=foo)
p.start()
This code suposedly freezes the code, but when running it in the python console, I get the same error as before.