1

I'm running grid search on random forests and trying to use n_jobs different than one but the kernel freezes, there is no CPU usage. With n_jobs=1 it works fine. I can't even stop the command with ctl-C and have to restart the kernel. I'm running on windows 7. I saw that there is a similar problem with OS X but the solution is not relevant for windows 7.

from sklearn.ensemble import RandomForestClassifier
rf_tfdidf = Pipeline([('vect',tfidf),
                  ('clf', RandomForestClassifier(n_estimators=50, 
class_weight='balanced_subsample'))])

param_grid = [{'vect__ngram_range':[(1,1)],
          'vect__stop_words': [stop],
          'vect__tokenizer':[tokenizer]
          }]
if __name__ == '__main__':
gs_rf_tfidf = GridSearchCV(rf_tfdidf, param_grid, scoring='accuracy', cv=5, 
                                                           verbose=10, 
                                                           n_jobs=2)
gs_rf_tfidf.fit(X_train_part, y_train_part)

thanks.

  • After if _name_ =='main' next lines need to have the appropriate indent. – seralouk Jul 05 '17 at 06:05
  • As sera said, it is the indentation: https://github.com/scikit-learn/scikit-learn/issues/2889 - btw surprised your code runs as is without indentation error – mkaran Jul 05 '17 at 08:55
  • If there is no indentation issue and it is just how you pasted your code here, maybe take a look at [this](there are also other issues for this kind of problem: [https://github.com/scikit-learn/scikit-learn/issues/5115#issuecomment-187683383 etc) ? – mkaran Jul 05 '17 at 09:03
  • It's a copy went wrong. The issue you gave here is not relevant for windows. Thanks. – Shachar Stern Jul 06 '17 at 07:02
  • The answer that I posted, solved a similar problem that I had in Windows 8. Try it please – seralouk Jul 06 '17 at 07:55
  • how can i change JOBLIB_START_METHOD? – Shachar Stern Jul 06 '17 at 10:47

1 Answers1

0

The indent after if __name__ == '__main__': is not correct. If it's not the case and it's a copy paste mistake then you can try something like :

if __name__ =='main':
    # your code indented !

So the first line of your script is if __name__ == '__main__': and then the rest code follows with the appropriate indent.

New Code

from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import Pipeline    

if __name__ == '__main__':

    rf_tfdidf = Pipeline([('vect',tfidf),('clf', RandomForestClassifier(n_estimators=50,class_weight='balanced_subsample'))])

    param_grid = [{'vect__ngram_range':[(1,1)],'vect__stop_words': [stop],'vect__tokenizer':[tokenizer]}]

    gs_rf_tfidf = GridSearchCV(rf_tfdidf, param_grid, scoring='accuracy', cv=5,verbose=10, n_jobs=-1)

    gs_rf_tfidf.fit(X_train_part, y_train_part)

This works fine for me (windows 8.1)

EDIT

The following works fine using PyCharm. I have not used spyder but it should also work for spyder:

Code

Class Test(object):
    def __init__(self):
        ###code here
        ###code here    

if __name__ == '__main__':
    Test()
seralouk
  • 30,938
  • 9
  • 118
  • 133
  • 1
    Works great! Just wondering. I managed to run the script from the console, but how can i run this in my IDE (spyder) and debug if needed. – Shachar Stern Jul 06 '17 at 07:04
  • Hello. In general you should use a text editor like VIM [link](http://www.vim.org/) or Atom [link](https://atom.io/). You can write your scripts in .py files and then run them in the console. For real-time debugging you can use PyCharm [link](https://www.jetbrains.com/pycharm/). Finally, i am glad i could help. you can mark it as accepted so that others can try the same solution – seralouk Jul 06 '17 at 07:54
  • @ShacharStern Spyder is very easy to use. In the menu bar, you will find the Debug option. You can put breakpoints upto where you want the code to run. – Vivek Kumar Jul 06 '17 at 10:53
  • Vivek, I don't think you understood me. I know how to work with spyder. I can only run this code from the command line because it's in "if __name__ == '__main__':" I'm looking for a way to run it on spyder. – Shachar Stern Jul 06 '17 at 11:00
  • @VivekKumar using a class you could run it in spyder. see here [link](https://stackoverflow.com/a/31123157/5025009) – seralouk Jul 06 '17 at 12:08