3

I have a simulation that is currently running, but the ETA is about 40 hours -- I'm trying to speed it up with multi-processing.

It essentially iterates over 3 values of one variable (L), and over 99 values of of a second variable (a). Using these values, it essentially runs a complex simulation and returns 9 different standard deviations. Thus (even though I haven't coded it that way yet) it is essentially a function that takes two values as inputs (L,a) and returns 9 values.

Here is the essence of the code I have:

STD_1 = []
STD_2 = []
# etc.

for L in range(0,6,2):
    for a in range(1,100):
        ### simulation code ###
        STD_1.append(value_1)
        STD_2.append(value_2)
        # etc.

Here is what I can modify it to:

master_list = []

def simulate(a,L):
    ### simulation code ###
    return (a,L,STD_1, STD_2 etc.)

for L in range(0,6,2):
    for a in range(1,100): 
        master_list.append(simulate(a,L))

Since each of the simulations are independent, it seems like an ideal place to implement some sort of multi-threading/processing.

How exactly would I go about coding this?

EDIT: Also, will everything be returned to the master list in order, or could it possibly be out of order if multiple processes are working?

EDIT 2: This is my code -- but it doesn't run correctly. It asks if I want to kill the program right after I run it.

import multiprocessing

data = []

for L in range(0,6,2):
    for a in range(1,100):
        data.append((L,a))

print (data)

def simulation(arg):
    # unpack the tuple
    a = arg[1]
    L = arg[0]
    STD_1 = a**2
    STD_2 = a**3
    STD_3 = a**4
    # simulation code #
    return((STD_1,STD_2,STD_3))

print("1")

p = multiprocessing.Pool()

print ("2")

results = p.map(simulation, data)

EDIT 3: Also what are the limitations of multiprocessing. I've heard that it doesn't work on OS X. Is this correct?

Austin Wismer
  • 281
  • 1
  • 4
  • 16
  • The way I usually do such a thing is to use `Pool.apply_async` right there in the loop where you're creating all the possible arguments. Then at the end of the loop, you put `p.close(); p.join()` and then wait ;-) – Henry Keiter Feb 22 '14 at 20:41
  • This is the solution you are looking for: [enter link description here](https://stackoverflow.com/questions/26520781/multiprocessing-pool-whats-the-difference-between-map-async-and-imap) – Coddy Mar 01 '21 at 13:36

3 Answers3

3
  • Wrap the data for each iteration up into a tuple.
  • Make a list data of those tuples
  • Write a function f to process one tuple and return one result
  • Create p = multiprocessing.Pool() object.
  • Call results = p.map(f, data)

This will run as many instances of f as your machine has cores in separate processes.

Edit1: Example:

from multiprocessing import Pool

data = [('bla', 1, 3, 7), ('spam', 12, 4, 8), ('eggs', 17, 1, 3)]

def f(t):
    name, a, b, c = t
    return (name, a + b + c)

p = Pool()
results = p.map(f, data)
print results

Edit2:

Multiprocessing should work fine on UNIX-like platforms such as OSX. Only platforms that lack os.fork (mainly MS Windows) need special attention. But even there it still works. See the multiprocessing documentation.

Roland Smith
  • 42,427
  • 3
  • 64
  • 94
  • Thanks for your response. Could you put your suggestions into a code block that will run so that it is a little clearer to me? I'm still pretty new to Python, and having a simple example that pertains exactly to my situation would be quite helpful! – Austin Wismer Feb 22 '14 at 19:45
  • On my computer, this code hangs indefinitely. I am running Python 3.3.3 on OS X. – Austin Wismer Feb 22 '14 at 20:46
  • @user264087: There are some improvements to multiprocessing for OSX in progress, but they are scheduled for 3.4. Try running the code in a debugger to see where it hangs... – Roland Smith Feb 22 '14 at 21:29
  • I just stuck print statements in to see which command was the problem. It is "p = Pool()" – Austin Wismer Feb 22 '14 at 23:22
  • @user264087 It seems the detection of the number of CPUs on OSX is somewhat broken. Try initializing the `Pool` with the actual number of CPUs, e.g. `Pool(4)`. – Roland Smith Feb 23 '14 at 00:12
  • It is still hanging. If I add any print() statement before that line, it doesn't hang, but as soon as it executes the line in question, a dialogue box pops up: "The program is still running! Do you want to kill it?" – Austin Wismer Feb 23 '14 at 00:26
  • Maybe you should sumbit a bug report to the python maintainers. – Roland Smith Feb 23 '14 at 00:39
  • Is it specific to my version of Python, I'm guessing? If so, is there a way for me to downgrade to an earlier version in which this would work? – Austin Wismer Feb 23 '14 at 00:42
  • You could try 2.7.6. But keep in mind that there are some differences between python2 and 3. – Roland Smith Feb 23 '14 at 21:06
  • Givevs OSError: [Errno 12] Cannot allocate memory – Coddy Mar 01 '21 at 03:26
  • @Coddy Which Python version and operating system do you use? – Roland Smith Mar 01 '21 at 15:00
  • @RolandSmith python 3.9 – Coddy Mar 01 '21 at 16:27
0

Here is one way to run it in parallel threads:

import threading

L_a = []

for L in range(0,6,2):
    for a in range(1,100):
        L_a.append((L,a))
        # Add the rest of your objects here

def RunParallelThreads():
    # Create an index list
    indexes = range(0,len(L_a))
    # Create the output list
    output = [None for i in indexes]
    # Create all the parallel threads
    threads = [threading.Thread(target=simulate,args=(output,i)) for i in indexes]
    # Start all the parallel threads
    for thread in threads: thread.start()
    # Wait for all the parallel threads to complete
    for thread in threads: thread.join()
    # Return the output list
    return output

def simulate(list,index):
    (L,a) = L_a[index]
    list[index] = (a,L) # Add the rest of your objects here

master_list = RunParallelThreads()
barak manos
  • 29,648
  • 10
  • 62
  • 114
  • In the reading I've done in the past couple days, people have indicated that threading is inferior to multiprocessing. I read somewhere that it actually takes longer with threading than with a serial process. Is this true? – Austin Wismer Feb 22 '14 at 20:14
  • Not sure. I do recall a friend telling me that multi-threading in Python is simulated by the Python interpreter rather than being executed as "real" parallel threads by the OS. – barak manos Feb 22 '14 at 20:16
  • In any case, the code above runs. So you can simply add the rest of your objects, and compare the performance with that of your sequential code. – barak manos Feb 22 '14 at 20:31
  • 1
    In CPython, due to the Global Interpreter Lock ("GIL"), only *one thread at a time* can be executing Python bytecode. This limits the usefulness of threads for *computing-intensive* tasks in *CPython*. – Roland Smith Feb 22 '14 at 20:32
0

Use Pool().imap_unordered if ordering is not important. It will return results in a non-blocking fashion.

Coddy
  • 549
  • 4
  • 18