14

I would expect the following snippet to give me an iterator yielding pairs from the Cartesian product of the two input iterables:

$ python
Python 2.7.1+ (r271:86832, Apr 11 2011, 18:13:53) 
[GCC 4.5.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import itertools
>>> one = xrange(0, 10**9)
>>> two = (1,)
>>> prods = itertools.product(one, two)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
MemoryError

Instead, I get a MemoryError. But I thought that itertools.product did not store the intermediate results in memory, so what's causing the MemoryError?

detly
  • 29,332
  • 18
  • 93
  • 152

3 Answers3

18

It doesn't store intermediate results, but it has to store the input values because each of those might be needed several times for several output values.

Since you can only iterate once over an iterator, product cannot be implemented equivalent to this:

def prod(a, b):
  for x in a:
    for y in b:
       yield (x, y)

If here b is an iterator, it will be exhausted after the first iteration of the outer loop and no more elements will be produced in subsequent executions of for y in b.

product works around this problem by storing all the elements that are produced by b, so that they can be used repeatedly:

def prod(a, b):
  b_ = tuple(b)  # create tuple with all the elements produced by b
  for x in a:
    for y in b_:
       yield (x, y)

In fact, product tries to store the elements produced by all the iterables it is given, even though that could be avoided for its first parameter. The function only needs to walk over the first iterable once, so it wouldn't have to cache those values. But it tries to do anyway, which leads to the MemoryError you see.

sth
  • 222,467
  • 53
  • 283
  • 367
  • Thanks for filling out the motivation of the implementation. I suppose the only other way to get around this would be to insist that the supplied iterables are somehow copyable as well. – detly Jan 02 '12 at 04:09
  • 4
    I got the source of the problem. But what is a workaround if one does need functionality of product() ? – DS R Oct 17 '17 at 03:54
  • 1
    @DSR Did you ever find one? – tejasvi88 Aug 21 '20 at 05:04
7

itertools.product does not store the intermediate products in memory, but it does store tuple versions of the original iterators.

This can be seen by looking at the source of the itertools module. It's in the file Modules/itertoolsmodule.c in the Python 2.7.2 source distribution. There we find, in the function product_new (basically the constructor for the product object) from line 1828 onwards:

for (i=0; i < nargs ; ++i) {
    PyObject *item = PyTuple_GET_ITEM(args, i);
    PyObject *pool = PySequence_Tuple(item);
    if (pool == NULL)
        goto error;
    PyTuple_SET_ITEM(pools, i, pool);
    indices[i] = 0;
}

In that code, args are the arguments to product. In the third line of this code piece, the ith argument is converted to a tuple. Hence, the code tries to convert your iterator xrange(0, 10**9) to a tuple, resulting in a MemoryError.

I am not sure why itertools.product behaves like this. Instead of storing each input iterator as a tuple it should be enough to store the last item returned from each iterator. (EDIT: See sth's answer for the reason)

Florian Brucker
  • 9,621
  • 3
  • 48
  • 81
  • That's quite interesting. I suppose for something this simple I could just construct my own generator though. – detly Jan 01 '12 at 21:20
0

I think the problem could be that xrange returns its own special sort of object, which is not a normal iterable.

xrange is implemented in such a way (as are lists) that you can iterate over the object many times, while you can iterate over a normal generator object only once. So perhaps something from this functionality is responsible for the memory error.

robince
  • 10,826
  • 3
  • 35
  • 48