Python Multiprocessing with a single function

Question

I have a simulation that is currently running, but the ETA is about 40 hours -- I'm trying to speed it up with multi-processing.

It essentially iterates over 3 values of one variable (L), and over 99 values of of a second variable (a). Using these values, it essentially runs a complex simulation and returns 9 different standard deviations. Thus (even though I haven't coded it that way yet) it is essentially a function that takes two values as inputs (L,a) and returns 9 values.

Here is the essence of the code I have:

STD_1 = []
STD_2 = []
# etc.

for L in range(0,6,2):
    for a in range(1,100):
        ### simulation code ###
        STD_1.append(value_1)
        STD_2.append(value_2)
        # etc.

Here is what I can modify it to:

master_list = []

def simulate(a,L):
    ### simulation code ###
    return (a,L,STD_1, STD_2 etc.)

for L in range(0,6,2):
    for a in range(1,100): 
        master_list.append(simulate(a,L))

Since each of the simulations are independent, it seems like an ideal place to implement some sort of multi-threading/processing.

How exactly would I go about coding this?

EDIT: Also, will everything be returned to the master list in order, or could it possibly be out of order if multiple processes are working?

EDIT 2: This is my code -- but it doesn't run correctly. It asks if I want to kill the program right after I run it.

import multiprocessing

data = []

for L in range(0,6,2):
    for a in range(1,100):
        data.append((L,a))

print (data)

def simulation(arg):
    # unpack the tuple
    a = arg[1]
    L = arg[0]
    STD_1 = a**2
    STD_2 = a**3
    STD_3 = a**4
    # simulation code #
    return((STD_1,STD_2,STD_3))

print("1")

p = multiprocessing.Pool()

print ("2")

results = p.map(simulation, data)

EDIT 3: Also what are the limitations of multiprocessing. I've heard that it doesn't work on OS X. Is this correct?

The way I usually do such a thing is to use `Pool.apply_async` right there in the loop where you're creating all the possible arguments. Then at the end of the loop, you put `p.close(); p.join()` and then wait ;-) — Henry Keiter, Feb 22 '14 at 20:41
This is the solution you are looking for: [enter link description here](https://stackoverflow.com/questions/26520781/multiprocessing-pool-whats-the-difference-between-map-async-and-imap) — Coddy, Mar 01 '21 at 13:36

Roland Smith · Answer 1 · 2014-02-22T20:45:51.150

3

Wrap the data for each iteration up into a tuple.
Make a list data of those tuples
Write a function f to process one tuple and return one result
Create p = multiprocessing.Pool() object.
Call results = p.map(f, data)

This will run as many instances of f as your machine has cores in separate processes.

Edit1: Example:

from multiprocessing import Pool

data = [('bla', 1, 3, 7), ('spam', 12, 4, 8), ('eggs', 17, 1, 3)]

def f(t):
    name, a, b, c = t
    return (name, a + b + c)

p = Pool()
results = p.map(f, data)
print results

Edit2:

Multiprocessing should work fine on UNIX-like platforms such as OSX. Only platforms that lack os.fork (mainly MS Windows) need special attention. But even there it still works. See the multiprocessing documentation.

edited Feb 22 '14 at 20:45

answered Feb 22 '14 at 19:40

Roland Smith

42,427
3
64
94

Thanks for your response. Could you put your suggestions into a code block that will run so that it is a little clearer to me? I'm still pretty new to Python, and having a simple example that pertains exactly to my situation would be quite helpful! – Austin Wismer Feb 22 '14 at 19:45
On my computer, this code hangs indefinitely. I am running Python 3.3.3 on OS X. – Austin Wismer Feb 22 '14 at 20:46
@user264087: There are some improvements to multiprocessing for OSX in progress, but they are scheduled for 3.4. Try running the code in a debugger to see where it hangs... – Roland Smith Feb 22 '14 at 21:29
I just stuck print statements in to see which command was the problem. It is "p = Pool()" – Austin Wismer Feb 22 '14 at 23:22
@user264087 It seems the detection of the number of CPUs on OSX is somewhat broken. Try initializing the `Pool` with the actual number of CPUs, e.g. `Pool(4)`. – Roland Smith Feb 23 '14 at 00:12
It is still hanging. If I add any print() statement before that line, it doesn't hang, but as soon as it executes the line in question, a dialogue box pops up: "The program is still running! Do you want to kill it?" – Austin Wismer Feb 23 '14 at 00:26
Maybe you should sumbit a bug report to the python maintainers. – Roland Smith Feb 23 '14 at 00:39
Is it specific to my version of Python, I'm guessing? If so, is there a way for me to downgrade to an earlier version in which this would work? – Austin Wismer Feb 23 '14 at 00:42
You could try 2.7.6. But keep in mind that there are some differences between python2 and 3. – Roland Smith Feb 23 '14 at 21:06
Givevs OSError: [Errno 12] Cannot allocate memory – Coddy Mar 01 '21 at 03:26
@Coddy Which Python version and operating system do you use? – Roland Smith Mar 01 '21 at 15:00
@RolandSmith python 3.9 – Coddy Mar 01 '21 at 16:27

barak manos · Answer 2 · 2014-02-22T20:30:35.647

0

Here is one way to run it in parallel threads:

import threading

L_a = []

for L in range(0,6,2):
    for a in range(1,100):
        L_a.append((L,a))
        # Add the rest of your objects here

def RunParallelThreads():
    # Create an index list
    indexes = range(0,len(L_a))
    # Create the output list
    output = [None for i in indexes]
    # Create all the parallel threads
    threads = [threading.Thread(target=simulate,args=(output,i)) for i in indexes]
    # Start all the parallel threads
    for thread in threads: thread.start()
    # Wait for all the parallel threads to complete
    for thread in threads: thread.join()
    # Return the output list
    return output

def simulate(list,index):
    (L,a) = L_a[index]
    list[index] = (a,L) # Add the rest of your objects here

master_list = RunParallelThreads()

edited Feb 22 '14 at 20:30

answered Feb 22 '14 at 20:09

barak manos

29,648
10
62
114

In the reading I've done in the past couple days, people have indicated that threading is inferior to multiprocessing. I read somewhere that it actually takes longer with threading than with a serial process. Is this true? – Austin Wismer Feb 22 '14 at 20:14
Not sure. I do recall a friend telling me that multi-threading in Python is simulated by the Python interpreter rather than being executed as "real" parallel threads by the OS. – barak manos Feb 22 '14 at 20:16
In any case, the code above runs. So you can simply add the rest of your objects, and compare the performance with that of your sequential code. – barak manos Feb 22 '14 at 20:31
1

In CPython, due to the Global Interpreter Lock ("GIL"), only *one thread at a time* can be executing Python bytecode. This limits the usefulness of threads for *computing-intensive* tasks in *CPython*. – Roland Smith Feb 22 '14 at 20:32

score 0 · Answer 3 · answered Mar 01 '21 at 13:37

0

Use Pool().imap_unordered if ordering is not important. It will return results in a non-blocking fashion.

answered Mar 01 '21 at 13:37

Coddy

549
4
18

Python Multiprocessing with a single function

3 Answers3

Linked