I am working on a cluster of computers, and recently started using parallel programming with python.
My current understanding is: mpi4py helps to manage work between different nodes, and multiprocessing manages work in a node's cores.
I have divided a big for loop into parts based on the number of procs running,
comm.Get_size() = nprocs
and then tried to generate a pool of processes by the cpu_count of the node, and then give them work. The code is as follows:
if rank == proc:
global output_dictionary
output_dictionary = {}
p=Pool(processes=cpu_count())
print "rank", rank, "started backpropagating. Async mapping..."
results = []
for key in serialize_and_divide(n_muscles,n_mn,rank,nprocs):
r = p.apply_async(calc_neuronij_grad,key,callback=append_grads_list)
results.append(r)
for r in results:
r.wait()
p.close()
p.join()
My problem is that the moment the code reaches p = Pool(processes=cpu_count()), it generates the error "Cannot allocate memory". The Traceback is the same as Python cannot allocate memory using multiprocessing.pool , but the solution doesn't help.
Any hints/help/explanations are appreciated