Im trying to parallelize (in some simple way) my Machine Learning code that originally uses Shogun Machine Learning toolbox. There are many possible configurations for training, so sequential processing is not a suitable approach. I have a learning machine object named mkl_object
whose parameters would be updated according to a grid parameter path
list (paths
) generated by a path generator I programmed, which is called gridObj.generateRandomGridPaths()
. I'd like having a multiprocessing setting such that mkl_object
learns a model for each path. That is, e.g., three models corresponding to a list of three paths: paths = [[('wave', [0.5, 100]), '2-gr', 7, 'weibull', 1.0, 0.4], [('spherical', [0.5, 50]), '2-gr_tfidf', 26, 'linear', 5.0, 20.0], [('exponential', [0.5, 50]), '3-gr_tfidf', 22, 'triangular', 1.5, 1.3]]
, would learn three models in separated cores each. See my code and its erroneous output below:
from multiprocessing import Pool
#from functools import partial # I already tried with partial and parmap
#import parmap as par
# My Machine learning and random grid search modules:
from mklObj import *
from gridObj import *
# The input training and test data subsets are ShogunFeature objects
[feats_train,
feats_test,
labelsTr,
labelsTs] = load_multiclassToy('../shogun-data/toy/',# Directory
'train_multiclass.dat',# Sample dataSet file name
'label_multiclass.dat')# Multi-class Labels file name
mkl_object = mklObj() # Learning machine global instantiation
#Function for mapping:
def mkPool(path): # path: a list of learning parameters
global feats_train # Train and test data produced above
global labelsTr
global feats_test
global labelsTs
global mkl_object
if path[0][0] is 'gaussian':
a = 2*path[0][1][0]**2
b = 2*path[0][1][1]**2
else:
a = path[0][1][0]
b = path[0][1][1]
# Setting each listelement (paths[i]) as learning parameter:
mkl_object.mklC = path[5]
mkl_object.weightRegNorm = path[4]
mkl_object.fit_kernel(featsTr=feats_train,
targetsTr=labelsTr,
featsTs=feats_test,
targetsTs= labelsTs,
kernelFamily=path[0][0],
randomRange=[a, b],
randomParams=[(a + b)/2, 1.0],
hyper=path[3],
pKers=path[2])
# Returns the test error:
return mkl_object.testerr
if __name__ == '__main__':
p = Pool(3)
#### Loading the experimentation grid of parameters.
grid = gridObj(file = 'gridParameterDic.txt')
paths = grid.generateRandomGridPaths(trials = 3)
print 'See the path list: ', paths
[a, b, c] = paths
# I already made tests with passing 'paths' and '[paths]' and the error is the same.
print p.map(mkPool, [a, b, c])
See below the output error:
/usr/bin/python2.7 /home/.../mklCall.py
See the path list: [[('wave', [0.5, 100]), '2-gr', 7, 'weibull', 1.0, 0.4], [('spherical', [0.5, 50]), '2-gr_tfidf', 26, 'linear', 5.0, 20.0], [('exponential', [0.5, 50]), '3-gr_tfidf', 22, 'triangular', 1.5, 1.3]]
Traceback (most recent call last):
The entered hyperparameter distribution is not allowed: weibull
File "../mklCall.py", line 76, in <module>
The entered hyperparameter distribution is not allowed: linear
print p.map(mkPool, [a, b, c])
The entered hyperparameter distribution is not allowed: triangular
File "/usr/lib/python2.7/multiprocessing/pool.py", line 251, in map
return self.map_async(func, iterable, chunksize).get()
File "/usr/lib/python2.7/multiprocessing/pool.py", line 558, in get
raise self._value
TypeError: 'NoneType' object is not iterable
Process finished with exit code 1
The above costum exception shouldn't taking place because weibull
(and others appearing) is a valid string (input parameter). Thus it seems that there are unknown origin disorders while execution. This error repeats as len(paths)
.
If I run the training for a single path, without using Pool.map()
, there are not errors.
I also ran the code in linear form for some paths and there were not errors:
acc = []
for path in paths:
print 'A path: ', path
acc.append(mkPool(path))
print 'Accuracy: ', acc[-1]
I followed the python documentation https://docs.python.org/2/library/multiprocessing.html . Suggetions, examples or possible solutions will be very appreciated.
Thank you in advance.