I have around 4000 data points and I have a program that processes them. Due to the huge number of points the program is very slow, although I've applied some vectorization using numpy.arange in nested loops.
I searched for pool.map, the problem is that it takes only one argument. I see there exist some answers to this problem here, Python multiprocessing pool.map for multiple arguments. I used the last one which uses map method with a list of arguments, I have around 4 args, I put them in a list and passed in the map method a long with the function name. In the function, I've extracted each argument from the list and perform the operation, but it doesn't work. This is the code where I call map,
if __name__ == '__main__':
pool= Pool(processes=8)
p= pool.map (kriging1D, [x,v,a,n])
plt.scatter(x,v,color='red')
plt.plot(range(5227),p,color='blue')
This is the function to be parallelized,
def kriging1D(args):
x=args[0]
v=args [1]
a= args [2]
n= args [3]
#perform some operations on the args..
...
#return the result..
But, I get this error,
plt.plot(range(5227),p,color='blue')
NameError: name 'p' is not defined
Note: before adding this line,
if __name__ == '__main__':
I got this error,
RuntimeError:
Attempt to start a new process before the current process
has finished its bootstrapping phase.
This probably means that you are on Windows and you have
forgotten to use the proper idiom in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce a Windows executable.
That's why I've added the if statement.
For More Clarity: v and x are vectors each of a large size as 4000 (both have the same length). My intent is to parallelize the processing of each v[i] and x[i], so for example process multiple v and x elements at a time, instead of processing elements one by one.
Can anyone please tell me what mistake I'm doing? Or, suggest an alternative method?
Thank You.