0

Im novice even in python and Im trying to write fast code with the multiprocessing module of python. Actually my question is very general: I'd like to know different ways of using multiprocessing and Im very confused because Im not sure how exactly this code works in order to do correct generalizations

import numpy as np
from multiprocessing import Process, Pool

def sqd(x):
  return x*x.T

A = np.random.random((10000, 10000))

if __name__ == '__main__':
   pool = Pool(processes = 4)
   result = pool.apply_async(sqd, [A])
   print result.get(timeout = 1)
   print len(pool.map(sqd, A))

However, when I performed the following generalization in order to accelerate the random matrix generation, things are not so good

import numpy as np
from multiprocessing import Pool

def sqd(d):
  x = np.random.random((d, d))
  return x*x.T

D=100

if __name__ == '__main__':
   pool = Pool(processes = 4)
   result = pool.apply_async(sqd, [D])
   print result.get(timeout = 1)
   print pool.map(sqd, D)

So the output is:

$ python prueba2.py
[[ 0.50770071  0.36508745  0.02447127 ...,  0.12122494  0.72641019
0.68209404]
[ 0.19470934  0.89260293  0.58143287 ...,  0.25042778  0.05046485
0.50856362]
[ 0.67367326  0.76929582  0.4232229  ...,  0.72910757  0.56047056
0.11873254]
..., 
[ 0.91234565  0.20216969  0.2961842  ...,  0.57539533  0.99836323
0.79875158]
[ 0.85407066  0.99905665  0.12948157 ...,  0.58411818  0.06688349
0.71026483]
[ 0.0599241   0.82759421  0.9532148  ...,  0.22463593  0.0859876
0.41072156]]
Traceback (most recent call last):
File "prueba2.py", line 14, in <module>
print pool.map(sqd, D)
File "/home/nacho/anaconda/lib/python2.7/multiprocessing/pool.py", line 251, in map
return self.map_async(func, iterable, chunksize).get()
File "/home/nacho/anaconda/lib/python2.7/multiprocessing/pool.py", line 304, in   map_async
iterable = list(iterable)
TypeError: 'int' object is not iterable

In this case, I know that Im passing incorrect arguments to "something" but Im not sure what is the reason for that, what I can and what I can't do for this specific cases and others different to pass lists or ranges to the multiprocessing module, also I'd like to know how to free memory after this given that I permitted once executing without memory error...

I'd like to add some details, regardless to I'd like to know different use cases using multiprocessing, the motivation underlying this question is because I took a picture of my processors just at working and there is an isolated process working at single processor which I suppose is due to random() so I'd like to parallelize the complete task

I hope not being so ambiguous. Thank you in advance...

Nacho
  • 792
  • 1
  • 5
  • 23

1 Answers1

2

You can not define the argument of the function that you want to compute with multiprocessing inside the function. What pool.map does, chop your A array and map it between different processors and the job is contributed between the number of processors you have submitted until it finishes. However in your code, you just gave the dimension of your input array as an argument to pool.map, therefore it just computed it for once and raises error, while map needs your function plus an iterable argument.

Dalek
  • 4,168
  • 11
  • 48
  • 100
  • Excuse me @Dalek I did a mistake int he definition of sqd(). I wrote it just like this def sqd(d): x = np.random.random((d, d)) return x*x.T – Nacho Aug 23 '14 at 18:14
  • On the other hand, if what I need to give to the function is just the dimension of the matrix in order to pass this number (d) to the random(d), then what exactly I need to do? also passing to the map function the dimension of the random and product matrix what I need? Thank you so much for your answers... – Nacho Aug 23 '14 at 18:20
  • @Nacho Do you want to write something like a function with variable input for the dimension of the array, which in meanwhile uses multiprocessing? – Dalek Aug 23 '14 at 19:05
  • Yes I suppose... I mean, in this menera it occurred to me that I should parallelize `random()`. This is not the main task I'd like to do but I think that if this is not clear to me, I will never be able to solve more complex tasks... Im not sure if it is right. – Nacho Aug 23 '14 at 19:37
  • For example it is not clear to me why it is necessary that `result=pool.apply_async(sqd, [A]) print result.get(timeout = 1) print pool.map(sqd, A)` being coded together, it appears to be independent instructions I mean how `result` is related with `pool.map(sqd, A)`? – Nacho Aug 23 '14 at 19:42
  • @Nacho [here](http://stackoverflow.com/questions/8533318/python-multiprocessing-pool-when-to-use-apply-apply-async-or-map) there is a comprehensive explanation about the difference between usage of `pool.apply_async` and `pool.map`. – Dalek Aug 23 '14 at 19:57
  • @Nacho `print len(pool.map(sqd, A))` just prints the size of the output array while the `pool.apply_async` calls the function and at the end `result.get(timeout = 1)` returns the output array. – Dalek Aug 23 '14 at 20:11
  • In fact the link you gave me seems to be clear, if we see at `pool.map(sqd, A)` (or `pool.apply_async()`) it is not useful at the code because the task duplicates. So aiming to the results of the parallel execution we need only one of them and I should make a choose. You think that I understood? Regarding to the use cases, I still have not a clear general view but I found an advice in `http://stackoverflow.com/questions/14810014/how-do-i-use-key-word-arguments-with-python-multiprocessing-pool-apply-async` which is at least useful to solve my current doubt: `D=100; r = pool.map(sqd, (D,))` – Nacho Aug 23 '14 at 21:26