freeze_support bug in using scikit-learn in the Anaconda python distro?

Question

I just want to be sure this is not about my code but it needs to be fixed in the relevant Python package. (By the way, does this look like something I can manually patch even before the vendor ships an update?) I was using scikit-learn-0.15b1 which called these. Thanks!

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Anaconda\lib\multiprocessing\forking.py", line 380, in main
    prepare(preparation_data)
  File "C:\Anaconda\lib\multiprocessing\forking.py", line 495, in prepare
    '__parents_main__', file, path_name, etc
  File "H:\Documents\GitHub\health_wealth\code\controls\lasso\scikit_notreat_predictors.py", line 36, in <module>
    gs.fit(X_train, y_train)
  File "C:\Anaconda\lib\site-packages\sklearn\grid_search.py", line 597, in fit
    return self._fit(X, y, ParameterGrid(self.param_grid))
  File "C:\Anaconda\lib\site-packages\sklearn\grid_search.py", line 379, in _fit
    for parameters in parameter_iterable
  File "C:\Anaconda\lib\site-packages\sklearn\externals\joblib\parallel.py", line 604, in __call__
    self._pool = MemmapingPool(n_jobs, **poolargs)
  File "C:\Anaconda\lib\site-packages\sklearn\externals\joblib\pool.py", line 559, in __init__
    super(MemmapingPool, self).__init__(**poolargs)
  File "C:\Anaconda\lib\site-packages\sklearn\externals\joblib\pool.py", line 400, in __init__
    super(PicklingPool, self).__init__(**poolargs)
  File "C:\Anaconda\lib\multiprocessing\pool.py", line 159, in __init__
    self._repopulate_pool()
  File "C:\Anaconda\lib\multiprocessing\pool.py", line 223, in _repopulate_pool
    w.start()
  File "C:\Anaconda\lib\multiprocessing\process.py", line 130, in start
    self._popen = Popen(self)
  File "C:\Anaconda\lib\multiprocessing\forking.py", line 258, in __init__
    cmd = get_command_line() + [rhandle]
  File "C:\Anaconda\lib\multiprocessing\forking.py", line 358, in get_command_line
    is not going to be frozen to produce a Windows executable.''')
RuntimeError: 
            Attempt to start a new process before the current process
            has finished its bootstrapping phase.

            This probably means that you are on Windows and you have
            forgotten to use the proper idiom in the main module:

                if __name__ == '__main__':
                    freeze_support()
                    ...

            The "freeze_support()" line can be omitted if the program
            is not going to be frozen to produce a Windows executable.

UPDATE: Here is my edited script, but it still leads to the exact same error after it spawned the processes for GridSearchCV. Actually, quite some after the command reported how many folds and fits it will do, but other than that I don't know when it crashes. Shall I put freeze_support somewhere else?

import scipy as sp
import numpy as np
import pandas as pd
import multiprocessing as mp

if __name__=='__main__':
    mp.freeze_support()

print("Started.")
# n = 10**6
# notreatadapter = iopro.text_adapter('S:/data/controls/notreat.csv', parser='csv')
# X = notreatadapter[1:][0:n]
# y = notreatadapter[0][0:n]
notreatdata = pd.read_stata('S:/data/controls/notreat.dta')
X = notreatdata.iloc[:,1:]
y = notreatdata.iloc[:,0]
n = y.shape[0]

print("Data lodaded.")
from sklearn import cross_validation
X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, y, test_size=0.4, random_state=0)

print("Data split.")
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X_train)  # Don't cheat - fit only on training data
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)  # apply same transformation to test data

print("Data scaled.")
# build a model
from sklearn.linear_model import SGDClassifier
model = SGDClassifier(penalty='elasticnet',n_iter = np.ceil(10**6 / n),shuffle=True)
#model.fit(X,y)

print("CV starts.")
from sklearn import grid_search
# run grid search
param_grid = [{'alpha' : 10.0**-np.arange(1,7),'l1_ratio':[.05, .15, .5, .7, .9, .95, .99, 1]}]
gs = grid_search.GridSearchCV(model,param_grid,n_jobs=8,verbose=1)
gs.fit(X_train, y_train)

print("Scores for alphas:")
print(gs.grid_scores_)
print("Best estimator:")
print(gs.best_estimator_)
print("Best score:")
print(gs.best_score_)
print("Best parameters:")
print(gs.best_params_)

score 1 · Answer 1 · answered Jun 29 '15 at 14:41

You can find information related to multiprocessing here in point 16.6.2.3.

So, a working example would be:

from multiprocessing import Process, freeze_support

def f():
    print 'hello world!'

if __name__ == '__main__':
    freeze_support()
    Process(target=f).start()

score 0 · Accepted Answer · answered Jun 23 '14 at 10:48

0

This probably means that you are on Windows and you have forgotten to use the proper idiom in the main module:

if __name__ == '__main__':
    freeze_support()

answered Jun 23 '14 at 10:48

fwu

359
2
10

Right, but that is in a package, right? My script has no main module. Which main module should it be, and can I simply add this line to that .py file in my library? – László Jun 23 '14 at 10:51
1

Yes, you can simply add this to your .py file. – fwu Jun 23 '14 at 10:54
But it is not the forking.py where I need to add this, actually they are the ones raising the error? My script should have these two lines? Thanks for the clarification! ` def freeze_support(): ''' Run code for process object if this in not the main process ''' if is_forking(sys.argv): main() sys.exit() def get_command_line(): ''' Returns prefix of command line used for spawning a child process ''' if getattr(process.current_process(), '_inheriting', False): raise RuntimeError(''' …` – László Jun 23 '14 at 11:09
1

This needs to be in your script, at top-level. – Fred Foo Jun 23 '14 at 12:42
@larsmans Thanks, so my script itself should call this obscure function from multiprocessing, even if it is not defining anything, only calling lines to interpret? OK, I'll comment again if the script still crashes. – László Jun 23 '14 at 14:05
1

Yes. See [this question](http://stackoverflow.com/q/13922597/166749) for more info about `freeze_support`. – Fred Foo Jun 23 '14 at 15:13
@larsmans Thanks, but the code still just crashed the same way, even with the addition. I'll edit the question with the full script, can you let me know why it is not working? – László Jun 23 '14 at 17:12
5

Maybe you need to move all the code out of the top-level module into a `main()` function and call `main()` in the `if __name__ == '__main__'` block. – asmeurer Jun 23 '14 at 17:54
@asmeurer Thanks. You mean that `freeze_support()` was not called when the script was run? Could we verify that? Do people agree that no other piece of code (called modules) need this edit, only my own, the wrapper? Thanks again. – László Jun 23 '14 at 18:32
1

I see, thanks, everyone, I did wrap everything into a main function, I am not sure why I was scared of that. Thanks. Also see another answer on my more general question on this: http://stackoverflow.com/a/24374798/938408 – László Jun 23 '14 at 20:48
A RuntimeError in this particular situation is not related to freeze_support(). See my [explanation](http://stackoverflow.com/a/39459576/1438906) for a similar question. – wombatonfire Sep 13 '16 at 17:35

freeze_support bug in using scikit-learn in the Anaconda python distro?

2 Answers2

Linked