38

Whats a simple code that does parallel processing in python 2.7? All the examples Ive found online are convoluted and include unnecessary codes.

how would i do a simple brute force integer factoring program where I can factor 1 integer on each core (4)? my real program probably only needs 2 cores, and need to share information.

I know that parallel-python and other libraries exist, but i want to keep the number of libraries used to a minimum, thus I want to use the thread and/or multiprocessing libraries, since they come with python

calccrypto
  • 8,583
  • 21
  • 68
  • 99
  • Here is another way, I have explained here :- [Minimum active threaded Processes][1] [1]: http://stackoverflow.com/a/32337959/4850220 – Vikas Gautam Sep 03 '15 at 17:40

4 Answers4

32

A good simple way to start with parallel processing in python is just the pool mapping in mutiprocessing -- its like the usual python maps but individual function calls are spread out over the different number of processes.

Factoring is a nice example of this - you can brute-force check all the divisions spreading out over all available tasks:

from multiprocessing import Pool
import numpy

numToFactor = 976

def isFactor(x):
    result = None
    div = (numToFactor / x)
    if div*x == numToFactor:
        result = (x,div)
    return result

if __name__ == '__main__':
    pool = Pool(processes=4)
    possibleFactors = range(1,int(numpy.floor(numpy.sqrt(numToFactor)))+1)
    print 'Checking ', possibleFactors
    result = pool.map(isFactor, possibleFactors)
    cleaned = [x for x in result if not x is None]
    print 'Factors are', cleaned

This gives me

Checking  [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]
Factors are [(1, 976), (2, 488), (4, 244), (8, 122), (16, 61)]
Jonathan Dursi
  • 50,107
  • 9
  • 127
  • 158
  • 2
    I should add that the above works, but probably won't perform amazing feats of parallel performance because the overhead you're invoking (the parallel map + function call) is all to calculate a tiny amount of work (a little bit of integer arithmetic). I'll leave it as an exercise for the reader to think about how to amortize the overhead over more divisions -- eg, how to change the above code so that "isFactor" is called once for a number of divisions. – Jonathan Dursi Oct 02 '10 at 18:26
  • Example code gives error `AttributeError: Can't get attribute 'isFactor' on ` – StatguyUser May 21 '18 at 11:25
  • 1
    Python 3 note: Instead of numToFactor / x in isFactor(x), replace it with integer divide (//) –  Jun 01 '18 at 04:35
8

mincemeat is the simplest map/reduce implementation that I've found. Also, it's very light on dependencies - it's a single file and does everything with standard library.

Tim McNamara
  • 18,019
  • 4
  • 52
  • 83
1

I agree that using Pool from multiprocessing is probably the best route if you want to stay within the standard library. If you are interested in doing other types of parallel processing, but not learning anything new (i.e. still using the same interface as multiprocessing), then you could try pathos, which provides several forms of parallel maps and has pretty much the same interface as multiprocessing does.

Python 2.7.6 (default, Nov 12 2013, 13:26:39) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> numToFactor = 976
>>> def isFactor(x):
...   result = None
...   div = (numToFactor / x)
...   if div*x == numToFactor:
...     result = (x,div)
...   return result
... 
>>> from pathos.multiprocessing import ProcessingPool as MPool
>>> p = MPool(4)
>>> possible = range(1,int(numpy.floor(numpy.sqrt(numToFactor)))+1)
>>> # standard blocking map
>>> result = [x for x in p.map(isFactor, possible) if x is not None]
>>> print result
[(1, 976), (2, 488), (4, 244), (8, 122), (16, 61)]
>>>
>>> # asynchronous map (there's also iterative maps too)
>>> obj = p.amap(isFactor, possible)                  
>>> obj
<processing.pool.MapResult object at 0x108efc450>
>>> print [x for x in obj.get() if x is not None]
[(1, 976), (2, 488), (4, 244), (8, 122), (16, 61)]
>>>
>>> # there's also parallel-python maps (blocking, iterative, and async) 
>>> from pathos.pp import ParallelPythonPool as PPool
>>> q = PPool(4)
>>> result = [x for x in q.map(isFactor, possible) if x is not None]
>>> print result
[(1, 976), (2, 488), (4, 244), (8, 122), (16, 61)]

Also, pathos has a sister package with the same interface, called pyina, which runs mpi4py, but provides it with parallel maps that run in MPI and can be run using several schedulers.

One other advantage is that pathos comes with a much better serializer than you can get in standard python, so it's much more capable than multiprocessing at serializing a range of functions and other things. And you can do everything from the interpreter.

>>> class Foo(object):
...   b = 1
...   def factory(self, a):
...     def _square(x):
...       return a*x**2 + self.b
...     return _square
... 
>>> f = Foo()
>>> f.b = 100
>>> g = f.factory(-1)
>>> p.map(g, range(10))
[100, 99, 96, 91, 84, 75, 64, 51, 36, 19]
>>> 

Get the code here: https://github.com/uqfoundation

Mike McKerns
  • 33,715
  • 8
  • 119
  • 139
0

This can be done elegantly with Ray, a system that allows you to easily parallelize and distribute your Python code.

To parallelize your example, you'd need to define your map function with the @ray.remote decorator, and then invoke it with .remote. This will ensure that every instance of the remote function will executed in a different process.

import ray

ray.init()

# Define the function to compute the factors of a number as a remote function.
# This will make sure that a call to this function will run it in a different
# process.
@ray.remote
def compute_factors(x):
    factors = [] 
    for i in range(1, x + 1):
       if x % i == 0:
           factors.append(i)
    return factors    

# List of inputs.
inputs = [67, 24, 18, 312]

# Call a copy of compute_factors() on each element in inputs.
# Each copy will be executed in a separate process.
# Note that a remote function returns a future, i.e., an
# identifier of the result, rather that the result itself.
# This enables the calls to remote function to not be blocking,
# which enables us to call many remote function in parallel. 
result_ids = [compute_factors.remote(x) for x in inputs]

# Now get the results
results = ray.get(result_ids)

# Print the results.
for i in range(len(inputs)):
    print("The factors of", inputs[i], "are", results[i]) 

There are a number of advantages of using Ray over the multiprocessing module. In particular, the same code will run on a single machine as well as on a cluster of machines. For more advantages of Ray see this related post.

Ion Stoica
  • 797
  • 9
  • 7