Using python multiprocessing for sklearn NN

Question

I am using dev version of Python sklearn package with NN implementation. My task is to train 4 NN with different input data and the average the predictions

X_median = preprocessing.scale(data_median)
X_min = preprocessing.scale(data_min)
X_max = preprocessing.scale(data_max)
X_mean = preprocessing.scale(data_mean)

I creat a Neural Networks like this

NN1 = MLPClassifier(hidden_layer_sizes = (50), activation = 'logistic', algorithm='adam', alpha= 0 , max_iter = 40, batch_size = 10, learning_rate = 'adaptive', shuffle = True, random_state=1)
NN2 = MLPClassifier(hidden_layer_sizes = (50), activation = 'logistic', algorithm='adam', alpha= 0 , max_iter = 40, batch_size = 10, learning_rate = 'adaptive', shuffle = True, random_state=1)
NN3 = MLPClassifier(hidden_layer_sizes = (50), activation = 'logistic', algorithm='adam', alpha= 0 , max_iter = 40, batch_size = 10, learning_rate = 'adaptive', shuffle = True, random_state=1)
NN4 = MLPClassifier(hidden_layer_sizes = (50), activation = 'logistic', algorithm='adam', alpha= 0 , max_iter = 40, batch_size = 10, learning_rate = 'adaptive', shuffle = True, random_state=1)

(standard sklearn function)

and I want to train them on described above datasets. Without using pool my code will look like this:

NN1.fit(X_mean,train_y)
NN2.fit(X_median,train_y)
NN3.fit(X_min,train_y)
NN4.fit(X_max,train_y)

Of course since all 4 training are independent I want to run them in parallel, and I assume I should use pool for this. However, I do not understand completely how the computation is performed. I would assume to write something like this:

pool = Pool()
pool.apply_async(NN1.fit, args = (X_mean, train_y))

However, this does not produce any results, I can even type like this(passing only one argument) and the program will finish without any errors! pool.apply_async(NN1.fit, args = (X_mean,)).

What will be the correct way to perform such computations? Can someone advise good resource to understand the usage of Python multiprocessing?

For `apply_async` you need to provide a callback to be executed when the computation is done. I think you want `apply()`, which waits for the computation before returning. — mirosval, Jun 29 '16 at 11:46
Indeed, seems code like this do the job: `def Myfunc(MyNN,X,train_y): MyNN.fit(X,train_y) return MyNN` and then `NN_mean = pool.apply(Myfunc, (NN_mean,X_mean, train_y))` — Shir, Jun 30 '16 at 10:36

score 1 · Answer 1 · edited May 23 '17 at 11:58

Finally I made it work)

I based my solution on this answer. So, firstly create two help functions:

1)

def Myfunc(MyNN,X,train_y):
MyBrain.fit(X,train_y)
return MyNN

This one is just to make desirable function global to feed pool methods

2)

def test_star(a_b):
return Myfunc(*a_b)

This is key part of it- help function to take 1 argument and split it to desirable number of args Myfunc needed.

Then just create

mylist = [(NN_mean,X_mean, train_y), (NN_median,X_median, train_y)]

and execute

NN_mean, NN_median = pool.map(test_star, my list).

From my point of view this solution is super ugly, but it works. I hope someone can create more elegant one and post it :).

Yeah, `pool` objects can only handle data in a certain way. You can gain more flexibility with the `pathos` module, but I usually just go with a wrapper rather than using another external module. — Jeff, Jun 30 '16 at 13:26

Using python multiprocessing for sklearn NN

1 Answers1