Scikit-Learn GridSearchCV: Avoid function to copy data for each process in parallel

Question

I use sklearn.grid_search.GridSearchCV in parallel with several cpus/cores. Calling the fit method creates several copies (one for each process) of my data. That causes my processes to crash due to memory limitations.

Is there a way to prevent the function from copying the data for each process? Can I use shared memory for all cores?

maybe this answer http://stackoverflow.com/a/24411581/288875 gives you some hints — Andre Holzner, Oct 02 '14 at 17:00

score 1 · Answer 1 · answered Jul 11 '17 at 18:32

1

python by default creates a new process for each parallel task. This new process copies the data. I would recommend using the multiprocess shared environment to avoid this. You can see an example in https://github.com/alvarouc/polyssifier/blob/master/polyssifier/polyssifier.py#L87

answered Jul 11 '17 at 18:32

Alvaro Ulloa

75
5

Thank you for your answer! And thank you for sharing with the community! – Ohumeronen Jul 12 '17 at 20:06

Scikit-Learn GridSearchCV: Avoid function to copy data for each process in parallel

1 Answers1