multithread vs list comprehension in simulation

Question

Suppose we have a recurrence relation

A[0] = a
A[i+1] = f(A[i],c)

where c is a parameter and f is some function, say

def f(x, c):
    return sin(x) + c

Assume that, for a given a, we want to evaluate A[i] for i in range(0,n) for c in Cs = [c[j] for j in range(0, m)], where n and m are fairly large.

Question in a few words: Should I use multithreading for each c or list comprehension for each i.

Let me explain. I am considering the following two approaches:

Approach 1:

Store the A[i] in a 2D array with each row containing the values of the sequence for a fixed c[i]. But store the array in column-major order, such that the A[i] for a constant i are contiguous.

Then use list comprehension to compute them

for i in range(0,n):
    A[:,i+1] = [f(A[i],c) for c in Cs]

Approach 2:

Store the sequences as before in a 2D array, but in row-major order this time.

Have a function that given a row fills it up with the values of the sequence for a given c.

def sequence(j):
    A[j, 0] = a
    for i in range(0, n - 1):
        A[j, i+1] = f(A[j, i], Cs[j])

And then call sequence for different js in different ~~threads~~ processes using, say multiprocessing.Pool.

Which of the two approaches should I prefer?

Experiment:

I tried the following test

import numpy
from multiprocessing.dummy import Pool
from multiprocessing import cpu_count
import time

def func(x):
    N = 400
    A = numpy.array([[i*j for i in range(0,N)] for j in range(0, N)])
    h = numpy.array([x for i in range(0, N)])
    y = numpy.dot(A, h.transpose())
    return y[-1]

start_time = time.time()

def multiproc():
    print('Multiple processes')
    print(cpu_count())
    mypool = Pool(cpu_count())
    print(mypool.map(func, [i for i in range(0,100)]))


def multiproc2():
    print('Multiple processes 2')
    pool = Pool(cpu_count())
    res = numpy.empty(100)
    for i in range(0,100):
        res[i] = pool.apply_async(func, (i,)).get()
    pool.close()
    pool.join()
    print(res)

def singleproc():
    for i in range(0,100):
        print(func(i))
    print('Single process')

funcs = [multiproc, singleproc, multiproc2]

funcs[1]()

print("%.6f seconds" % (time.time() - start_time))

Changing the call funcs[1]() for funcs[0]() or funcs[2]() we get pretty much the same time in every case.

cpython does cooperative multithreading with the [GIL](https://wiki.python.org/moin/GlobalInterpreterLock). Only 1 thread can run python code at a time. If you have a CPU intensive task, threading is actually slower because the threads tend to switch around with each other. — tdelaney, Nov 03 '16 at 19:49
If the function is a pure function, it's good idea to memorize the results in order to increase the efficiency, by using something like `@lru_cache` decorator from functools lib or just by writing your own if there is a need. I think I would go for list comprehensions, even tho I don't rlly know a lot about multithreading. — Nf4r, Nov 03 '16 at 19:54
@tdelaney Having read, without having understood yet, something like that is the reason I had my doubts. On the other hand I find it confusing. For example I tried a function that prints a given number and sleeps for 5 seconds. Then ran a `multiprocessing.Pool` mapping the function to a list of several numbers. I put the pool size to be the same as the length of the list. The execution prints all the numbers almost at once, and then waits 5 seconds. So, in that example multithreading did help because it didn't wait 5 seconds for each input. Why is that? — myfirsttime1, Nov 03 '16 at 20:03
@tdelaney Is it because the sleep(5) is not python code inside and so the multithreading does work? — myfirsttime1, Nov 03 '16 at 20:09
Its the wait - its not a cpu-intensive action. Think in terms of acquiring and releasing the Global Interpreter Lock. When released, another thread can acquire it and run. When python sleeps, it releases the lock so others run. When python is doing its normal processing, it releases the lock periodically to simulate multithreading but that actually costs some time. So, when processing a big list where you need to aquire the lock its slow. When sleeping its fast. And suppose you use numpy. It releases the lock and runs faster. — tdelaney, Nov 03 '16 at 20:11

score 1 · Answer 1 · answered Nov 04 '16 at 03:58

1

I would prefer using the Pool wrapper as it definitely seems to be better the thread approach. Try this:

from multiprocessing import Pool
import numpy as np

def f(x, c):
    return sin(x)+c

A = np.zeros(shape=(m, n))
for i in range(n-1):
    pool = Pool()
    res = []
    for j in range(m):
        res.append(pool.apply_async(f, (A[i, j], Cs[j])))
    pool.close()
    pool.join()
    for j in range(m):
        A[i+1, j] = res[j].get()

You can always time the two approaches and see which one is the fastest with:

 import time
 start_time = time.time()
 # your code
 print("%.6f seconds" % (time.time() - start_time))

It is not very accurate, but it should be enough for your purpose.

answered Nov 04 '16 at 03:58

Davide Valeriani

46
2

I added to the question's post some code for an experiment. – myfirsttime1 Nov 04 '16 at 15:09
If you do `res[i] = pool.apply_async(func, (i,)).get()` you are not using multiprocessing, as the pool waits until it gets the result before the next iteration. You need a for loop of `res[i] = pool.apply_async(func, (i,))`, then the `pool.close()` and `pool.join()`, and then the for loop gathering the results. – Davide Valeriani Nov 06 '16 at 22:52

multithread vs list comprehension in simulation

1 Answers1

Linked