2

I have multiple functions to train different classifiers, each function returns some related output parametes. The execution time of each is slightly long, so I want to take advantage of multiprocessing.

For example:

test_mthd = 'complete'
row_num = 288
prob_scores_ANN = test_ANN(test_dataset,test_labels, test_mthd, row_num, 
                       input_hidden_weights, hidden_output_weights, 
                       input_hidden_bias, hidden_output_bias)
predictions_KNN= eval_KNN(trainingSet,testSet, test_mthd, row_num)

Now,

from multiprocessing import Process
if __name__=='__main__':
    p1 = Process(target=building_tree_CART(trainingSet, depth_cond=8, min_cond=1))
    p1.start()
    p2 = Process(target= train_ANN(training_data,training_labels))  
    p2.start()
    p1.join()
    p2.join()

Inspiration for this is from: LINK

I think its a typo error: I changed training to target, and p1 runs and then p2 starts. and how do we return values from each function?

Thanks, Gopi

Gopi
  • 369
  • 1
  • 17
  • Unluckily "an error" is the worst error of all when it comes to diagnosing it. Please add a detailed error description and the full traceback. – Klaus D. Dec 02 '17 at 13:16

1 Answers1

0

Check documentation of multiprocessing module. link

To retrieve values when using mp.process you have to use mp.queue. I find this way of multiprocessing a bit too detailed, you could explore mp.Pool instead.

However, for your example:

from multiprocessing import Queue, Process


def building_tree_CART(p1queue)
    trainingSet, depth_cond, min_cond = p1queue.get()
    #do stuff
    p1queue.put(variable)

def train_ANN(p2queue)
    training_data,training_labels = p2queue.get()
    #...
    p2queue.put(result)

if __name__=='__main__':
    #create separate instances of queues for processes
    p1queue = Queue()
    p1queue.put(trainingSet, 8, 1)
    p2queue = Queue()
    p2queue.put(training_data, training_labels)
    #Process targets a function, arguments are passed separately
    p1 = Process(target=building_tree_CART, args=p1queue)
    p1.start()
    p2 = Process(target=train_ANN, args=p2queue)  
    p2.start()
    p1.join()
    returned_variable = p1queue.get()
    p2.join()
    returned_variable2 = p2queue.get()

From the limited information you provided this is the best guess. In case you need to run functions multiple times, I suggest you use mp.Pool.map or mp.Pool.apply. For my own purposes I have found mp.Pool.apply_async to be the fastest and the most convenient way of using multiprocessing module.

zck
  • 311
  • 1
  • 3
  • I get error while running this: ` Traceback (most recent call last): File "", line 1, in p2queue.put(training_data, training_labels) File "C:\Users\dana0941\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\queues.py", line 82, in put if not self._sem.acquire(block, timeout): TypeError: only length-1 arrays can be converted to Python scalars` – Gopi Dec 03 '17 at 03:33
  • the same error for the first function call too, I adjusted my functions as you suggested and put `output_variables` in `p1queue.put(output_variables)` – Gopi Dec 03 '17 at 03:36