9

Suppose you have got a list comprehension in python, like

Values = [ f(x) for x in range( 0, 1000 ) ]

with f being just a function without side effects. So all the entries can be computed independently.

Is Python able to increase the performance of this list comprehension compared with the "obvious" implementation; e.g. by shared-memory-parallelization on multicore CPUs?

shuhalo
  • 5,732
  • 12
  • 43
  • 60
  • I don't think python can, but there's some more info in a similar question http://stackoverflow.com/questions/5236364/how-to-parallelize-list-comprehension-calculations-in-python. – Dave Challis Jun 02 '11 at 11:55

3 Answers3

10

In Python 3.2 they added concurrent.futures, a nice library for solving problems concurrently. Consider this example:

import math, time
from concurrent import futures

PRIMES = [112272535095293, 112582705942171, 112272535095293, 115280095190773, 115797848077099, 1099726899285419, 112272535095293, 112582705942171, 112272535095293, 115280095190773, 115797848077099, 1099726899285419]

def is_prime(n):
    if n % 2 == 0:
        return False

    sqrt_n = int(math.floor(math.sqrt(n)))
    for i in range(3, sqrt_n + 1, 2):
        if n % i == 0:
            return False
    return True

def bench(f):
    start = time.time()
    f()
    elapsed = time.time() - start
    print("Completed in {} seconds".format(elapsed))

def concurrent():
    with futures.ProcessPoolExecutor() as executor:
        values = list(executor.map(is_prime, PRIMES))

def listcomp():
    values = [is_prime(x) for x in PRIMES]

Results on my quad core:

>>> bench(listcomp)
Completed in 14.463825941085815 seconds
>>> bench(concurrent)
Completed in 3.818351984024048 seconds
Zach Kelling
  • 52,505
  • 13
  • 109
  • 108
8

No, Python will not magically parallelize this for you. In fact, it can't, since it cannot prove the independence of the entries; that would require a great deal of program inspection/verification, which is impossible to get right in the general case.

If you want quick coarse-grained multicore parallelism, I recommend joblib instead:

from joblib import delayed, Parallel
values = Parallel(n_jobs=NUM_CPUS)(delayed(f)(x) for x in range(1000))

Not only have I witnessed near-linear speedups using this library, it also has the great feature of signals such as the one from Ctrl-C onto its worker processes, which cannot be said of all multiprocess libraries.

Note that joblib doesn't really support shared-memory parallelism: it spawns worker processes, not threads, so it incurs some communication overhead from sending data to workers and results back to the master process.

Fred Foo
  • 355,277
  • 75
  • 744
  • 836
0

Try if the following can be faster:

Values = map(f,range(0,1000))

That's a functionnal manner to code

Another idea is to replace all occurences of Values in the code by the generator expression

imap(f,range(0,1000))  # Python < 3

map(f,range(0,1000))  # Python 3
Matt
  • 17,290
  • 7
  • 57
  • 71
eyquem
  • 26,771
  • 7
  • 38
  • 46