0

In python I am using itertools.product to iterate over all possible combinations of a list of characters that produces a very large result. However when I look at the Windows 10 Task Manager the python process executing this task is only taking 13.5% CPU. I looked into multiprocessing in python, and found that with pool.map I can map an instance of a function to pool, and have multiple instances of the function running in parallel. This is great, but as I am iterating over a single (very large) list and this is done in one instance of a function that takes up a large amount of time, this doesn't help me.

So the way I see it the only way to speed this up is to split the result of itertools.product into groups and iterate over the groups in parallel. If I can get the length of the result itertools.product, I can divide it into groups by the number of processor cores I have available, and then using multiprocessing I can iterate over all these groups in parallel.

So my question is can this be done, and what is the best approach?

Maybe there is a module out there for this sort of thing?

The concept is something like this. (the following actually works but gives MemoryError when I try and scale it up to the full character set commented out):

#!/usr/bin/env python3.5
import sys, itertools, multiprocessing, functools

def process_group(iIterationNumber, iGroupSize, sCharacters, iCombinationLength, iCombintationsListLength, iTotalIterations):
    iStartIndex = 0
    if iIterationNumber > 1: iStartIndex = (iIterationNumber - 1) * iGroupSize
    iStopIndex = iGroupSize * iIterationNumber
    if iIterationNumber == iTotalIterations: iStopIndex = iCombintationsListLength
    aCombinations = itertools.product(sCharacters, repeat=iCombinationLength)
    lstCombinations = list(aCombinations)
    print("Iteration#", iIterationNumber, "StartIndex:", iStartIndex, iStopIndex)
    for iIndex in range(iStartIndex, iStopIndex):
        aCombination = lstCombinations[iIndex];
        print("Iteration#", iIterationNumber, ''.join(aCombination))

if __name__ == '__main__':
    #_sCharacters = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789~`!@#$%^&*()_-+={[}]|\"""':;?/>.<,"
    _sCharacters = "123"
    _iCombinationLength = 4
    aCombinations = itertools.product(_sCharacters, repeat=_iCombinationLength)
    lstCombinations = list(aCombinations)
    _iCombintationsListLength = len(lstCombinations)
    iCPUCores = 4
    _iGroupSize = round(_iCombintationsListLength / iCPUCores)
    print("Length", _iCombintationsListLength)
    pool = multiprocessing.Pool()
    pool.map(functools.partial(process_group, iGroupSize = _iGroupSize, sCharacters = _sCharacters, iCombinationLength = _iCombinationLength, iCombintationsListLength = _iCombintationsListLength, iTotalIterations = iCPUCores), range(1,iCPUCores+1))

Thanks for your time.

user2109254
  • 1,709
  • 2
  • 30
  • 49

1 Answers1

1

You can't share the product() output among subprocesses; there is no good way to break this up into chunks per process. Instead, have each subprocess generate new values but give them a prefix to start from.

Remove outer loops from the product() call and create groups from that. For example, you could create len(sCharacters) groups by decreasing iCombinationLength by one and passing in each element from sCharacters as a prefix:

for prefix in sCharacters:
    # create group for iCombinationLength - 1 results.
    # pass in the prefix

Each group then can loop over product(sCharacters, repeat=iCombinationLength - 1) themselves and combine that with the prefix. So group 1 starts with '0', group 2 starts with '1', etc.

You can extend this by using combinations of 2 or 3 or more characters. For your 10 input characters, that'd create 100 or 1000 groups, respectively. The generic version is:

prefix_length = 3
for prefix in product(sCharacters, repeat=prefix_length):
    # create group for iCombinationLength - prefix_length
    # pass in the prefix
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • Martin, I have updated the code to a working example... but it is not salable as you have pointed out. See the size of the character list, how would your approach work with this? – user2109254 Jun 07 '16 at 11:25
  • @user2109254: exactly the same way. – Martijn Pieters Jun 07 '16 at 11:26
  • Ok then I am not understanding it. How does this increase speed, I can't see any parallel processing? – user2109254 Jun 07 '16 at 11:28
  • @user2109254: it lets you break up the problem into smaller subsets for each process to handle. You can't share the `product` iterator, but you can produce a *subset* of all combinations in each subprocess to handle in parallel, all you need to do is generate the prefixes. – Martijn Pieters Jun 07 '16 at 11:33
  • Martijn, Thanks for the response. I think what you are talking about is like the accepted answer here: http://stackoverflow.com/questions/10262138/how-do-i-multi-process-the-itertools-product-module I will use this guys example as this is the level of detail I need to understand how it works. Thanks for your input. – user2109254 Jun 07 '16 at 11:37
  • Exactly; your question is a duplicate of that one. – Martijn Pieters Jun 07 '16 at 11:39
  • I do have one question that isn't covered in his example. How to exit all processes once a match is found..... – user2109254 Jun 07 '16 at 11:56
  • Have the master process kill the rest of the processes once the first one returns with a match? – Martijn Pieters Jun 07 '16 at 11:59