2

I have been reading about multiprocessing in Python (e.g. I have read this and this and this and this and so on; I have also read/watched different websites/videos such as this and this and this and so many more!) but I am still confused how I could apply multiprocessing to my specific problem. I have written a simple example code for calculating the avg value of randomly generated integers using Monte Carlo Simulation (I store the random integers in a variable called integers so I can finally calculate the mean; I am also generating random numpy.ndarrays and store them in a variable called arrays as I need to do some post-processing on those arrays later too):

import numpy as np

nMCS = 10 ** 8

integers = []
arrays = []
for i in range(nMCS):
    a = np.random.randint(0,10)
    b = np.random.rand(10,2)

    integers.append(a)
    arrays.append(b)

mean_val = np.average(integers)
# I will do post-processing on 'arrays' later!!

Now I want to utilize all of the 16 cores on my machine, so the random numbers/arrays are not generated in sequence and I can speed up the process. Based on what I have learnt, I recognize I need to store the results of each Monte Carlo Simulation (i.e. the generated random integer and random numpy.ndarray) and then use Inter-process communication in order to later store all of the results in a list. I have written different codes but unfortunately non of them work. As an example, when I write something like this:

import numpy as np
import multiprocessing

nMCS = 10 ** 6

integers = []
arrays = []

def monte_carlo():
    a = np.random.randint(0,10)
    b = np.random.rand(10,2)

if __name__ == '__main__':
    __spec__ = "ModuleSpec(name='builtins', loader=<class '_frozen_importlib.BuiltinImporter'>)" # this is because I am using Spyder!

    p1 = multiprocessing.Process(target = monte_carlo)

    p1.start()

    p1.join()

    for i in range(nMCS):

        integers.append(a)
        arrays.append(b)

I get the error "name 'a' is not defined". So could anyone please help me with this and tell me how I could generate as many random integers/arrays as possible concurrently, and then add them all to a list for further processing?

RezAm
  • 548
  • 2
  • 6
  • 20

2 Answers2

2

Due to the fact that returning a lot of result causes time for propagation between process, I would suggest to divide the task in few part and process it before returning back.

n = 4
def monte_carlo():
    raw_result = []
    for j in range(10**4 / n):
        a = np.random.randint(0,10)
        b = np.random.rand(10,2)
        raw_result .append([a,b])
    result = processResult(raw_result) 
    #Your method to reduce the result return, 
    #let's assume the return value is [avg(a),reformed_array(b)]
    return result

if __name__ == '__main__':
    __spec__ = "ModuleSpec(name='builtins', loader=<class '_frozen_importlib.BuiltinImporter'>)" # this is because I am using Spyder!

    pool = Pool(processes=4) 
    #you can control how many processes here, for example multiprocessing.cpu_count()-1 to avoid completely blocking

    multiple_results = [pool.apply_async(monte_carlo, (i,)) for i in range(n)]

    data = [res.get() for res in multiple_results]
    #OR
    data = pool.map(monte_carlo, [i for i in range(n)])
    #Both return you a list of [avg(a),reformed_array(b)]
MT-FreeHK
  • 2,462
  • 1
  • 13
  • 29
  • Thank you so much for your answer. In my real problem - which is a big one, this starts using my whole 100% CPU (which makes complete sense), and that stops me from being able to use other applications while the code is running. So could you please tell me how I may restrict my CPU usage to let's say 50% for example? – RezAm Oct 16 '18 at 04:00
  • Also, this is so weird but when I run your code, it takes more time than when I am using my code!!! Like when I use 10 ** 3 Monte Carlo Simulations, the elapsed time is almost 0 seconds for the original code that I posted, but it takes about 17 seconds when I am using MP i.e. your code (I can see your code uses 100% capacity of the CSU that's why I said it is weird!). Any ideas on that? – RezAm Oct 16 '18 at 04:30
  • 1
    @Antonio, you mean for the one without any MP right? – MT-FreeHK Oct 16 '18 at 04:35
  • 1
    It looks the me there is no control over the multiprocess number of execution of function monte_carlo? In your original function you are executing the function for 10**6 times only – Henry Yik Oct 16 '18 at 04:44
  • @MatrixTai, yes .. if you could please run the first code I have posted (without any MP), you will see it generates a list of random integers (as well as a list of random numpy.ndarrays) with the size equal to nMCS = 10 ** 8. And it only takes seconds! Meanwhile, when I use MP using your codes, it takes much longer!! And it is nonsense bc I see that using your code makes more CPUs working! – RezAm Oct 16 '18 at 04:45
  • So yeah, when I use my original code without MP it generates a list of random integers with the size of nMCS. If nMCS is too big (like let's say 10 ** 20), it takes a long time as the process is happening in sequence. What I am trying to do, is to generate as many random integers/arrays as possible CONCURRENTLY (i.e. in parallel instead of in sequence) by the use of MP, so I can save some time. And again, I see your code makes my CPUs active yet i do not know why it performs slower when we have 10 ** 8 processes i.e. simulations! – RezAm Oct 16 '18 at 04:49
  • @MatrixTai It seems like when I am not using MP, the code generates random integers/arrays for 10 ** 8 times and it happens very fast coz it is a pretty easy task. Meanwhile, in your code, generating a random integer/array is actually defined as a whole process itself, and I think that causes some delays in THIS SPECIFIC PROBLEM! Am I right?!! – RezAm Oct 16 '18 at 05:00
  • 1
    @Antonio, partly correct, I update a version using `map()`, seems like it works better under many task situation. You may test it, MP method will be finally faster if you provide more tasks. – MT-FreeHK Oct 16 '18 at 05:09
  • @MatrixTai Much better now but still 3.5 times slower for nMCS = 10 ** 8 .. Also, the results are now like a list of list and even for nMCS = 10 * 4 I cannot open the output (it says it can take a lot of time; do you still want to open it). Is there any way to resolve that? Thank you!! – RezAm Oct 16 '18 at 07:42
  • 1
    @Antonio, I don't have problem you mentioned (notice, I have typo, it should be `return [a,b]`). Also, MP is faster after 10**6 in my computer for first method. May be we should discuss in room, https://chat.stackoverflow.com/rooms/181933/room-for-question52827284 – MT-FreeHK Oct 16 '18 at 08:19
0

Simple error.

a and b are created in your function They do not exist in your main scope. you will need to return them back from your function

def monte_carlo():
    a = np.random.randint(0,10)
    b = np.random.rand(10,2)
    #create a return statement here. It may help if you put them into an array so you can return 2 value

if __name__ == '__main__':
__spec__ = "ModuleSpec(name='builtins', loader=<class 
    '_frozen_importlib.BuiltinImporter'>)" # this is because I am using Spyder!

    p1 = multiprocessing.Process(target = monte_carlo)

    p1.start()

    p1.join()
    #Call your function here and save the return to something
    for i in range(nMCS):

      integers.append(a) # paste here
      arrays.append(b) # and here

Edit: tested code and found you were never calling your monte_carlo function. a and b are now working correctly but you have a new error to try and solve. Sorry but I wont be able to help with this error as I dont understand it myself, but here is my edit of your code.

import numpy as np
import multiprocessing

nMCS = 10 ** 6

integers = []
arrays = []

def monte_carlo():
    a = np.random.randint(0,10)
    b = np.random.rand(10,2)
    temp = [a,b]
    return temp

if __name__ == '__main__':
__spec__ = "ModuleSpec(name='builtins', loader=<class 
'_frozen_importlib.BuiltinImporter'>)" # this is because I am using Spyder!

    p1 = multiprocessing.Process(target = monte_carlo())#added the extra brackets here

    p1.start()

    p1.join()

    for i in range(nMCS):
        array = monte_carlo()
        integers.append(array[0])
        arrays.append(array[1])

and here is the error I got with this edit. I am still learning multi processing myself so other people may be better suited to help with this

Process Process-6:
Traceback (most recent call last):
  File"c:\users\lunar\appdata\local\continuum\anaconda3\lib\multiprocessing\process.py", line 252, in _bootstrap
    self.run()
  File "c:\users\lunar\appdata\local\continuum\anaconda3\lib\multiprocessing\process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
TypeError: 'list' object is not callable
chillinOutMaxin
  • 182
  • 1
  • 13