If it were just permutations with replacement, it would be very easy: Just parallelize on the first letter of the string and then let each thread add the tail of the string. This would give you 26 independent tasks. If that is not enough, you could parallelize on the first two letters.
You want to have a permutation without replacement, so the problem does not trivially factorize. If one only wants to pick 8 letters from a set of 26, 52 and 62, one could do a naive brute thing: Parallelize on the first letter, let the thread just create the tail with replacement and discard the generated strings that contain duplicates. But when you want to pick 25 from 26 letters, this becomes really wasteful.
With that idea in mind, we can do better! We parallelize on the first letter of the string and then generate all permutations with seven elements from the set excluding the letter that we started with. This way we can have 26 tasks (or 52 or 62) and still use the algorithm. This could look like this:
# Use whatever chars you want as the set.
chars = set(string.ascii_lowercase)
# We iterate through all the possible heads. This exact loop will be
# parallelized in the end.
for head in chars:
# Exclude the head from the set.
remaining_chars = chars - set(head)
for tail in itertools.permutations(remaining_chars, 7):
string = head + ''.join(tail)
# Store the string in your list/file.
In order to make use of multiple cores, we use a pool. For this we first need a function to map over. This is just the above a bit refactored:
def make_strings(head):
remaining_chars = chars - set(head)
strings = [
head + ''.join(tail)
for tail in itertools.permutations(remaining_chars, 7)]
return strings
Now somewhere else we can create a pool and let it map over heads:
with multiprocessing.Pool() as pool:
all_strings = pool.map(make_strings, chars)
The pool only got the needed __enter__
and __exit__
properties with Python 3, so I assume that we use that.
Once that is finished the flatten the list of lists into a plain list of strings:
strings = [
string
for sub_list in all_strings
for string in sub_list]
Since 26 is an odd number for 16 cores, we could think about creating the heads with itertools.permutation(remaining_chars, 2)
and then using the set subtraction to generate the last 6 digits.
This is a complete working script for Python 3 that summarizes everything:
import itertools
import multiprocessing
import string
chars = set(string.ascii_lowercase)
def make_strings(head):
remaining_chars = chars - set(head)
strings = [
head + ''.join(tail)
for tail in itertools.permutations(remaining_chars, 3)]
return strings
def main():
with multiprocessing.Pool() as pool:
all_strings = pool.map(make_strings, chars)
strings = [
string
for sub_list in all_strings
for string in sub_list]
print(strings[:100])
if __name__ == '__main__':
main()