python itertools product slow is the write speed to a output file a bottleneck

Question

I have a simple python function performing itertools product function. As seen below.

def cart(n, seq):
    import itertools
    b = 8
    while b < n:
        n = n - 1
        for p in itertools.product(seq, repeat=n):
            file.write(''.join(p))
            file.write('\n')

The function works but it is extremely slow. It is not even using a noticeable amount of resources. I was wondering if the bottle neck was the disk write speed? currently the the script is averaging at 2.5 mb per second. I also attempted this to a solid state drive and recieved the same speeds which leads me to believe the write speed is not the bottle neck. Is there a way to speed this function up and use more system resources? or is itertools just slow? Forgive me I am new to python.

How long does it take to execute if you don't write to a file but instead generate and discard the strings? — Mark Byers, Dec 24 '12 at 00:49
note: `product()` generates `len(seq)**n` items i.e., the number grows exponentially (very fast) with n. Also you could write the while loop as: `for m in range(n, b, -1)` and use `repeat=m-1`. Also move `import itertools` outside the function. — jfs, Dec 24 '12 at 02:51
thank you for your suggestions i used the for loop as well to improve my code — jkdba, Dec 25 '12 at 03:23

martineau · Accepted Answer · 2012-12-28T17:57:56.317

2

You can profile your code to get an idea of the location of the bottleneck. The following will create a file called "cart_stats.txt" with the profiling information in it. Running it myself seems to indicate that most of the time is being spent calling file.write().

from cProfile import Profile
from pstats import Stats
prof = Profile()
prof.disable()

file = open('cart_output.txt', 'wt')

def cart(n, seq):
    import itertools
    b = 8
    while b < n:
        n = n - 1
        for p in itertools.product(seq, repeat=n):
            file.write(''.join(p))
            file.write('\n')

prof.enable()
cart(10, 'abc')
prof.disable()

prof.dump_stats('cart.stats')
with open('cart_stats.txt', 'wt') as output:
    stats = Stats('cart.stats', stream=output)
    stats.sort_stats('cumulative', 'time')
    stats.print_stats()

file.close()
print 'done'

FWIW, the slowness seems to be overwhelmingly due the calls to file.write() itself, because it's still there even if I open() the output stream with a huge buffer or make it a StringIO instance. I was able to significantly reduce that by optimizing and minimizing the calls to it as shown below:

def cart(n, seq):
    import itertools
    b = 8
    write = file.write  # speed up lookup of method
    while b < n:
        n = n - 1
        for p in itertools.product(seq, repeat=n):
            write(''.join(p)+'\n')  # only call it once in loop

Which proves that having a profiler in place can be the best way to know where to spend your time and get the most benefit.

Update:

Here's a version that stores all output generated in memory before making a single file.write() call. It is significantly faster than using StringIO.StringIO because it's less general, but however is still not quite as fast as using a cStringIO.StringIO instance.

file = open('cart_output.txt', 'wt')

def cart(n, seq):
    from itertools import product
    buflist = []
    append = buflist.append
    b = 8
    while b < n:
        n = n - 1
        for p in product(seq, repeat=n):
            append(''.join(p))
    file.write('\n'.join(buflist)+'\n')

file.close()

edited Dec 28 '12 at 17:57

answered Dec 24 '12 at 02:18

martineau

119,623
25
170
301

At one point someone posted to use cStringIO this really solved my problem but I always used the profiles to analyze both ouputs – jkdba Dec 25 '12 at 03:23
Yes, based on the limited profiling results I got, it seemed like the answer suggesting the use of `cStringIO` would likely help so I didn't repeat it to my answer -- strangely the answer suggesting it now seems to have disappeared. I posted mine anyway since I wasn't sure whether you were familiar with `cProfile` or not, because using it would at least tell your where the bottlenneck was occurring. Given that, it's strange that using a SSD didn't fix the problem...guess it must be OS overhead or something. – martineau Dec 25 '12 at 05:19
@JohnK: In simple cases it might be enough to call: `python -mcProfile your_script.py` without modifying the source code. Does cStringIO faster than a file opened with a large buffer in your case? You could also try tempfile.SpooledTemporaryFile (until a threshold it doesn't use a disk file) as an intermediate solution between keeping all data in memory and writing it to a disk file at once. – jfs Dec 26 '12 at 11:19
@J.F.Sebastian would `tempfile.SpooledTemporaryFile()` work better than cStringIO if I have already set a limit for cStringIO. I need to look further into your suggestion but it seems that SpooledTemporaryFile() has a built in threshold would make my current application much faster. – jkdba Dec 27 '12 at 19:01
@JohnK: The default buffering for `SpooledTemporaryFile()` is `bufsize=-1` which means "to use the system default, which is usually line buffered for tty devices and fully buffered for other files. If omitted, the system default is used". Also, you'll have to do something extra to save the results put into a temporary file which will normally automatically be deleted. The docs also say "The returned object is a file-like object whose `_file` attribute is either a `StringIO` object or a true `file` object". – martineau Dec 27 '12 at 19:41
@JohnK: Question: How did you "set a limit for `cStringIO`"? – martineau Dec 27 '12 at 21:11
@martineau for example output = StringIO Then after I write to it I grab the byte count with size = output.tell() then set a simple if max_size <= size: #truncate it to a file – jkdba Dec 28 '12 at 15:51
@JohnK: OK. That limited buffering approach used with a `cStringIO.StringIO` seems to be the fastest I/O method -- even slightly more so when compared to the pure Python unlimited memory buffered approach shown in the last Update I made to my answer. – martineau Dec 28 '12 at 17:44
@martineau Thanks for the update I am looking into it now with profiling... I am also looking into multiprocessing i was wondering if it might allow my to spawn multiple process each with the same buffer with communication so if i wanted to set a one gb limit then each process would write one gb to memory then truncate as another method of speeding this process up? – jkdba Jan 09 '13 at 22:43
@JohnK: Happy New Year! It's likely going to be difficult to share data between the processes, especially if its arrays of variably sized objects. The overhead involved that may cancel out any speed-of-generation gains. I think you might be better off just multi-threading which would allow I/O to occur at the same time output is being generated with as much overhead. – martineau Jan 10 '13 at 02:21
@martineau Happy New Year. I hate to ask this but I am trying to understand how I would implement multi-threading here. As a bit of a python novice the examples I am find seem a bit to specific to relate to my code. Can you point me in the right direction or do you know of a good example that may relate? I am looking at using queue and obviously threading. Any thoughts? – jkdba Jan 10 '13 at 04:14
@JohnK: Parhaps multiprocessing would be a good approach. [Here's](http://stackoverflow.com/a/2364667/355230) a good example you could probably adapt to what you're doing. I'll see if I can find something similar using threads. – martineau Jan 10 '13 at 21:50
@JohnK: OK, I think I found a good [threading example](http://stackoverflow.com/questions/2846653/python-multithreading-for-dummies) for you (and it's a lot simpler than the multiprocessing one). – martineau Jan 11 '13 at 18:58
@JohnK: Threading appears to make it slower based on a [test](http://pastebin.com/x53tH18n) script I set up. There's just too much overhead involved using the `threading` and `Queue` modules. – martineau Jan 13 '13 at 03:20
@martineau Thank you so much for the info I am looking into it right now and you are right it appears that the threading and queue seem to slow it down. I am going to try the multiprocessing next, although I think you are correct the communication here between the processes will be difficult. – jkdba Jan 13 '13 at 17:24
@JohnK: I don't think `itertools.product()` is all that slow, at least as compared to file I/O, so you might be better off just generating the products whenever you need them rather that writing (and presumably reading) them all. – martineau Jan 13 '13 at 17:57
@martineau I agree now however I will be removing the writing to file as like you stated it is inefficient. I am now loading into memory then using everything i generate on the fly from memory then dumping it. This why i think the multiprocessing will be useful because each process writes to different memory. At this point I just need to do much more research on multiprocessing to see how I can leverage my systems full potential. – jkdba Jan 13 '13 at 19:17
@JohnK: You could use a `multiprocessing.Manager` instance to simplify sharing the data `itertools.product()` produces among the processes if you store it in one of the types it supports. See the **Server process** subsection of the [Sharing state between processes](http://docs.python.org/2/library/multiprocessing.html#sharing-state-between-processes) section in the docs. – martineau Jan 13 '13 at 21:21
@martineau again thank you. As my current script I have been having an issue with the sharing and causes the function to just repeat itself on multiple processes. – jkdba Jan 13 '13 at 22:54

python itertools product slow is the write speed to a output file a bottleneck

1 Answers1