17

From a previous question I learned something interesting. If Python's itertools.product is fed a series of iterators, these iterators will be converted into tuples before the Cartesian product begins. Related questions look at the source code of itertools.product to conclude that, while no intermediate results are stored in memory, tuple versions of the original iterators are created before the product iteration begins.

Question: Is there a way to create an iterator to a Cartesian product when the (tuple converted) inputs are too large to hold in memory? Trivial example:

import itertools
A = itertools.permutations(xrange(100))
itertools.product(A)

A more practical use case would take in a series of (*iterables[, repeat]) like the original implementation of the function - the above is just an example. It doesn't look like you can use the current implementation of itertools.product, so I welcome in submission in pure python (though you can't beat the C backend of itertools!).

Community
  • 1
  • 1
Hooked
  • 84,485
  • 43
  • 192
  • 261
  • How do you propose that the products then are produced? You'd have to use `.tee()` which also caches iterators to do their job. – Martijn Pieters Aug 23 '12 at 14:00
  • 3
    Alternatively, you'd need re-startable iterators, e.g. you can only achieve your goal if you could somehow create each iterator a-new, to produce the exact same results as during the previous full run. – Martijn Pieters Aug 23 '12 at 14:04
  • @MartijnPieters I'm not sure (hence the question!). The heart of the question is, give an outer product implementation without any intermediate storage. Possible in `itertools`, I'm not sure - in pure Python, I think it may be possible as we can "restart" an iterator by crating it fresh each time it is needed. – Hooked Aug 23 '12 at 14:06
  • Exactly, but that would only work if you can guarantee the iterator will produce the same results each time you recreate it. – Martijn Pieters Aug 23 '12 at 14:08

4 Answers4

8

Here's an implementation which calls callables and iterates iterables, which are assumed restartable:

def product(*iterables, **kwargs):
    if len(iterables) == 0:
        yield ()
    else:
        iterables = iterables * kwargs.get('repeat', 1)
        it = iterables[0]
        for item in it() if callable(it) else iter(it):
            for items in product(*iterables[1:]):
                yield (item, ) + items

Testing:

import itertools
g = product(lambda: itertools.permutations(xrange(100)),
            lambda: itertools.permutations(xrange(100)))
print next(g)
print sum(1 for _ in g)
ecatmur
  • 152,476
  • 27
  • 293
  • 366
  • I don't fully understand why the input to `product` has to be a lambda function, it still seems to work even if you use my `A` from the example. – Hooked Aug 23 '12 at 15:04
  • @Hooked that'll work to start with, but once it reaches the end (in the inner loops) it won't know to restart the permutations. Try for a small range in `A` to see. – ecatmur Aug 23 '12 at 15:06
  • @Hooked: If we want recreatable iterators, you must pass the iterator creator to the product function. The iterators must only be created in the function itself. – Jo So Aug 23 '12 at 15:07
  • I see now, didn't realize that it would be dangerous to do something like `for x in product(lambda: A,repeat=2):`, which gives the completely wrong answer as `A` gets used up. Great answer, thank you! – Hooked Aug 23 '12 at 15:10
2

Without "iterator recreation", it may be possible for the first of the factors. But that would save only 1/n space (where n is the number of factors) and add confusion.

So the answer is iterator recreation. A client of the function would have to ensure that the creation of the iterators is pure (no side-effects). Like

def iterProduct(ic):
    if not ic:
        yield []
        return

    for i in ic[0]():
        for js in iterProduct(ic[1:]):
            yield [i] + js


# Test
x3 = lambda: xrange(3)
for i in iterProduct([x3,x3,x3]):
    print i
Jo So
  • 25,005
  • 6
  • 42
  • 59
  • The call to `len(ic)` fails with my input `A` as `object of type 'itertools.permutations' has no len()`. If I'm not mistaken, you also can't access an iterator by indexing `ic[0]`. as `__getitem__` is not implemented in general. – Hooked Aug 23 '12 at 14:48
  • @Hooked: The call to len() is now away. `ic` should not be an iterator, it is just a fixed number of factors provided as a list (there might be a solution with varargs, or one more functional (`x:xs = ...`) one, but my python is quite old. – Jo So Aug 23 '12 at 14:58
  • @DSM: `if ic:` is implicit in the second block. Try it out. – Jo So Aug 23 '12 at 15:00
  • @JoSo: ah, you're right. I was working with my own modification of your original code, so the bug was only on my side. :^) – DSM Aug 23 '12 at 15:02
1

This can't be done with standard Python generators, because some of the iterables must be cycled through multiple times. You have to use some kind of datatype capable of "reiteration." I've created a simple "reiterable" class and a non-recursive product algorithm. product should have more error-checking, but this is at least a first approach. The simple reiterable class...

class PermutationsReiterable(object):
    def __init__(self, value):
        self.value = value
    def __iter__(self):
        return itertools.permutations(xrange(self.value))

And product iteslf...

def product(*reiterables, **kwargs):
    if not reiterables:
        yield ()
        return
    reiterables *= kwargs.get('repeat', 1)

    iterables = [iter(ri) for ri in reiterables]
    try:
        states = [next(it) for it in iterables]
    except StopIteration:
        # outer product of zero-length iterable is empty
        return
    yield tuple(states)

    current_index = max_index = len(iterables) - 1
    while True:
        try:
            next_item = next(iterables[current_index])
        except StopIteration:
            if current_index > 0:
                new_iter = iter(reiterables[current_index])
                next_item = next(new_iter)
                states[current_index] = next_item
                iterables[current_index] = new_iter
                current_index -= 1
            else:
                # last iterable has run out; terminate generator
                return
        else:
            states[current_index] = next_item
            current_index = max_index
            yield tuple(states)

Tested:

>>> pi2 = PermutationsReiterable(2)
>>> list(pi2); list(pi2)
[(0, 1), (1, 0)]
[(0, 1), (1, 0)]
>>> list(product(pi2, repeat=2))
[((0, 1), (0, 1)), ((0, 1), (1, 0)), ((1, 0), (0, 1)), ((1, 0), (1, 0))]
>>> giant_product = product(PermutationsReiterable(100), repeat=5)
>>> len(list(itertools.islice(giant_product, 0, 5)))
5
>>> big_product = product(PermutationsReiterable(10), repeat=2)
>>> list(itertools.islice(big_product, 0, 5))
[((0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (0, 1, 2, 3, 4, 5, 6, 7, 8, 9)), 
 ((0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (0, 1, 2, 3, 4, 5, 6, 7, 9, 8)), 
 ((0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (0, 1, 2, 3, 4, 5, 6, 8, 7, 9)), 
 ((0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (0, 1, 2, 3, 4, 5, 6, 8, 9, 7)), 
 ((0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (0, 1, 2, 3, 4, 5, 6, 9, 7, 8))]
senderle
  • 145,869
  • 36
  • 209
  • 233
  • Thanks for the great answer. When you say "error-checking", what exactly would you look out for? Would it check to see if the iterator is restartable? How could you check for that? – Hooked Aug 23 '12 at 15:21
  • That seems overly complicated and confusing, when a simple solution was already provided long before. Does this one do anything in a better way? – Jo So Aug 23 '12 at 15:25
  • I just mean that `product` really ought to throw an error when an invalid keyword argument is passed; this doesn't. – senderle Aug 23 '12 at 15:27
  • @JoSo, it uses an iterative algorithm instead of a recursive one. I think having iterative algorithms is useful, especially given Python's default recursion limit. Admittedly, it's not likely to matter here, since you'd have to be calculating a pretty large cartesian product for the limit to apply. But then, this question is precisely about calculating large cartesian products, so there's a case to be made that an iterative approach is preferable. – senderle Aug 23 '12 at 15:30
  • @senderle: That would be (and this already is!) what I consider overengineering. – Jo So Aug 23 '12 at 15:31
  • @JoSo, then you must consider `itertools.product` overengineered. It throws an error if you pass an invalid keyword argument. – senderle Aug 23 '12 at 15:31
  • @senderle: Regarding recursion: You confused the length of the result with the number of factors. The clean recursive solution does only add one recursion per factor, which is no problem at all. – Jo So Aug 23 '12 at 15:34
  • @senderle: Regarding overengineering: That might be true, but the point of itertools is not being posted here and being readable, and the point of your solution is not being a python library in the first place. – Jo So Aug 23 '12 at 15:37
  • @JoSo, I've confused nothing. Iterative code correctly generates the cartesian product of an arbitrary number of sequences; recursive code does not (in Python). You'll also note that the c code for [`itertools.product`](http://hg.python.org/cpython/file/787ed9b03ef9/Modules/itertoolsmodule.c#l1885) is iterative rather than recursive. – senderle Aug 23 '12 at 15:45
  • 1
    Bug: `product()` should yield exactly one empty tuple. – ecatmur Aug 23 '12 at 15:50
  • 1
    Also, you're not using `repeat` in the function. – ecatmur Aug 23 '12 at 15:51
  • 1
    You fixed the wrong place! The `yield ()` should go at the beginning. – ecatmur Aug 23 '12 at 15:59
0

I'm sorry to up this topic but after spending hours debugging a program trying to iterate over recursively generated cartesian product of generators. I can tell you that none of the solutions above work if not working with constant numbers as in all the examples above.

Correction :

from itertools import tee

def product(*iterables, **kwargs):
    if len(iterables) == 0:
        yield ()
    else:
        iterables = iterables * kwargs.get('repeat', 1)
        it = iterables[0]
        for item in it() if callable(it) else iter(it):
            iterables_tee = list(map(tee, iterables[1:]))
            iterables[1:] = [it1 for it1, it2 in iterables_tee]
            iterable_copy = [it2 for it1, it2 in iterables_tee]
            for items in product(*iterable_copy):
                yield (item, ) + items

If your generators contain generators, you need to pass a copy to the recursive call.