0

I have 13 different words. I need to get permutations like all combinations of these words:

word1 word2 word3 word4 word5 word6 word7 word8 word9 word10 word11 word12 word13

But the combinations only should be 12 words long.

I have already a script to do this in python:

import time
start = time.time()
items = ['word1', 'word2', 'word3', 'word4', 'word5', 'word6', 'word7', 'word8', 'word9', 'word10, 'word11', 'word12', 'word13']
from itertools import permutations
for p in permutations(items, 12):
        print(p)
print 'It took', time.time()-start, 'seconds.'

But it's too slow, and takes 24 seconds when the combinations are only 4 words long.

With a javascript tool it only took 1 second for up to 9 different words; but when trying 10 different words the browser crashed.

Is there a fast efficient way to do this? Maybe with awk?

EDIT:

This is not the same question as Generating permutations using bash because this question has 13 separated words, and the answers in the other thread do not work with words.

Kind Regards.

Community
  • 1
  • 1
  • There is often confusion about the terminology of permutations and combinations, so let's clarify what you're looking for. Suppose there are just three words: `A`, `B`, and `C`, and you're looking for a list of combinations 2 words long. Which list do you expect? (1) `AB,AC,BA,BC,CA,CB` (2) `AB,AC,BC` (i.e. order is insignificant) (3) `AA,AB,AC,BA,BB,BC,CA,CB,CC` (4) `AA,AB,AC,BB,BC,CC` – Alex Hall Apr 17 '16 at 15:43
  • would epxect the list (1) in your example: – Frick Katrin Apr 17 '16 at 15:45
  • The main problem here is that there will be 6 billion permutations, so even the most efficient program will struggle because it not only has to generate them but it actually has to do something with them. For example in your script the calls to `print` are the slowest part. If you store them in a file they will probably take at least 500 GB. What do you want to do with these permutations? – Alex Hall Apr 17 '16 at 17:05
  • Yes i know that it will be around 300GB. I need them in a file. Actually i have to do it because i have a 12 word bitcoin BIP39 mnemonic, and i dont remember the order of the words but only the words. After having the file i want to write a script to check each one of them for valid mnemonic with BIP39 standard. Otherwise my bitcoins are lost :/ – Frick Katrin Apr 17 '16 at 17:07
  • That's pretty cool. Is the second part where you check each permutation easier to write if it reads from a file instead of directly checking under `for p in permutations(items, 12)`? – Alex Hall Apr 17 '16 at 17:13
  • Actually i didnt go longer in python as i saw that 4 words permutation already took 24seconds and in javascript i had 9 word permutation in 1second thats why python would take over 90years :D – Frick Katrin Apr 17 '16 at 17:26
  • I'm not sure why Python was SO slow for you. Maybe you were printing in a slow environment such as an IDE. But I've shown you that it can still do quite a lot. Also note that `permutations(items, 4)` has length `13*12*11*10=17 160` while the website you saw generated `9! = 362 880` permutations so the difference is probably not as bad as you think. Anyway, the best thing to do would be to check the permutations as you generate them in a single program rather than wasting time and space writing to and then reading from disk. Can you do that? How will you check them? – Alex Hall Apr 17 '16 at 17:41

1 Answers1

0

Repeatedly calling print will make a script slow, as each call has some overhead as it talks to the system it's going to print on. You get a significant improvement if you lump all the permutations into a single string and print that string once. But even then, printing is a lot of work as you're displaying text on a screen. It's much faster to just write to a file or immediately do whatever you plan on doing with these permutations.

There are smaller improvements that can be made as well. A tuple is a pretty messy string representation to put together: you need quotes (which involves checking if the string has its own quotes), commas, and parentheses. Just joining the words with a space is faster.

Going deeper, it's best if you can make looping implicit based on built-in functions since they're written in C, instead of your own loops written in Python. For example map is faster than a for comprehension if the function is also a fast built-in and not a Python function (e.g. a lambda). If you're interested, read this.

Combining all these ideas together we have:

with open('perms.txt', 'w') as out:
    out.write('\n'.join(map(' '.join, permutations(items, 7))))

This took 8 seconds to generate 9 million permutations of length 7.

Alex Hall
  • 34,833
  • 5
  • 57
  • 89
  • `You get a significant improvement if you lump all the permutations into a single string and print that string once.` - are you sure? You wouldn't in awk due to the memory management overhead in dynamic string concatenation, which I assume python has too... – Ed Morton Apr 17 '16 at 17:50
  • Yes, `print` has decent overhead. IO and such are generally a problem. The answer above was arrived at by successive improvements, each of which were timed. – Alex Hall Apr 17 '16 at 17:53
  • Also I think that `join` figures out how the long the string will be before allocating it so that you don't get a repeatedly doubling array. – Alex Hall Apr 17 '16 at 17:59