2

I'm writing a program that takes a string and compute all possible repeated permutations from this string. I'll show some fragments of my code, I would be grateful if someone can point me how to improved the speed when sending the data to a file.

Scenario 1

Sending the output to stdout took about 12 seconds to write 531,441 lines (3mb)

import itertools 
for word in itertools.product(abcdefghi,repeat = 6):
    print(word)

Scenario 2

Then I tried sending the output to a file instead of stdout, and this took a roughly around 5 minutes.

import itertools
word_counter=0
for word in itertools.product(abcdefghi,repeat = 6): 
    word_counter=word_counter+1
    if word_counter==1:
        open('myfile', 'w').write(word)
    else:
        open('myfile', 'a').write(word)

word_counter keep track of the number of repeated permutations as the function is looping. When word_counter is 1 the program creates the file and afterwards append the data to the file when word_counter is greater than 1.

I use a program on the web to do this and I found the program took the same time when printing the data to a terminal and this same web prgoram took about 3 seconds to output these combinations to a file while my program took 5 minutes to output the data to a file!

I also tried running my program and redirecting output to a file in a bash terminal, and this took the same time (3 sec)!

'myprog' > 'output file'
repzero
  • 8,254
  • 2
  • 18
  • 40
  • 1
    possible duplicate of [Why is printing to stdout so slow? Can it be sped up?](http://stackoverflow.com/questions/3857052/why-is-printing-to-stdout-so-slow-can-it-be-sped-up) – Cory Kramer Jul 24 '14 at 22:42
  • 1
    @Cyber the question you attached is file < terminal, and not file > terminal – almanegra Jul 24 '14 at 22:45

2 Answers2

5

You are reopening the file for every write, try not doing that:

import itertools

output = open('myfile', 'w')
for word in itertools.product(abcdefghi, repeat=6): 
    output.write(word + '\n')

[Edit with explanation] When you're working with 530,000 words, even making something a tiny bit slower for each word, adds up to a LOT slower for the whole program.

My way, you do one piece of setup work (open the file) and put it in memory, then go through 500,000 words and save them, then do one piece of tidy up work (close the file). That's why the file is saved in a variable - so you can set it up once, and use it again and again.

Your way, you do almost no setup work first, then you add one to the counter 500,000 times, check the value of the counter 500,000 times, branch this way or that 500,000 times, open the file and force Windows (or Linux) to check your permissions every time, put it in memory 500,000 times, write to it 500,000 times, stop using the file you opened (because you didn't save it) so it falls into the 'garbage' and gets tidied up - 500,000 times, and then finish.

The amount of work is small each time, but when you do them all so many times, it adds up.

TessellatingHeckler
  • 27,511
  • 4
  • 48
  • 87
  • wow..this seems so simple and all the time was wondering what is wrong with my program which is a little bulky..can you explain to me why we need to assign open('myfile','w') to a variable and what it is actually doing? ..your answer is greatly appreciated! – repzero Jul 24 '14 at 23:37
1

The same as previous answers but with a context!

import itertools
with open('myfile', 'w') as output:
    for word in itertools.product(abcdefghi, repeat=6): 
        output.write(word + '\n')

Context have the benefits of cleaning up after themselves and handling errors.

PsyKzz
  • 740
  • 4
  • 14