Improving the performance of cartesian product of multiple lists

Question

I am implementing the Cartesian product of multiple sets in python using recursion.

Here is my implementation:

def car_two_sets(a, b):
    result = []
    for x in a:
        for y in b:
            result.append(str(x) + str(y))
    return result


def car_multiple_sets(lists):
    if len(lists) == 2:
        return car_two_sets(lists[0], lists[1])
    else:
        return car_multiple_sets([car_two_sets(lists[0], lists[1])] + lists[2:])


a = [1, 2]
b = [3, 4]
c = [6, 7, 8]
lists = [a, b, c]
print(car_multiple_sets(lists))

The code works correctly, but for larger number of sets, it is slow. Any ideas on how to improve this implementation? I thought of memoization, but could not find any repetitive calculations to cache.

I do not want to use itertools functions.

Cartesian products are necessarily slow due to exponential growth. — Luatic, May 11 '22 at 09:38
When you used a profiler, which part of this code did you determine to be the bottleneck? Possibly converting to strings and doing string concatenation isn't the best idea. — mkrieger1, May 11 '22 at 09:38
Do you need to do this on your own (e.g. for educational purpose) or are you allowed to use a built-in module? If so, you might want to have a look at [`itertools.product`](https://docs.python.org/3/library/itertools.html#itertools.product). A quick overview about a possible implementation is also given in the linked documentation. Nonetheless, this does not directly focus on performance improvements. — albert, May 11 '22 at 09:40
@SharimIqbal that will give you a marginal performance improvement at best — juanpa.arrivillaga, May 11 '22 at 10:04
Note: implementing this on your own is great! However, I still recommend using the itertools functions *to test that your functions are correct*. It's easy to write tests that use both `itertools.product` and `car_multiple_sets` and compare the results of both to check that they are the same (although the order might differ; so you need to `sort` before comparing). — Stef, May 11 '22 at 10:19
Interestingly, the way your function `car_multiple_sets` works, by calling `car_two_sets` repeatedly, is exactly what [`functools.reduce`](https://docs.python.org/3/library/functools.html#functools.reduce) does. — Stef, May 11 '22 at 10:21
Do you really want to convert to strings? You can use tuples instead of strings, to pair integers together. Replace `str(x)+str(y)` with `(x, y)`. — Stef, May 11 '22 at 10:29

Kelly Bundy · Answer 1 · 2022-05-11T10:48:13.343

Benchmark with three times more lists:

 221 us   223 us   223 us  h
 225 us   227 us   227 us  k3
 228 us   229 us   229 us  k2
 267 us   267 us   267 us  k
 340 us   341 us   342 us  g
1177 us  1185 us  1194 us  car_multiple_sets
3057 us  3082 us  3084 us  f

Code (Try it online!):

from timeit import repeat
from random import shuffle
from bisect import insort
from itertools import product, starmap
from operator import concat

def car_two_sets(a, b):
    result = []
    for x in a:
        for y in b:
            result.append(str(x) + str(y))
    return result


def car_multiple_sets(lists):
    if len(lists) == 2:
        return car_two_sets(lists[0], lists[1])
    else:
        return car_multiple_sets([car_two_sets(lists[0], lists[1])] + lists[2:])

def f(lists):
    return [''.join(map(str,a)) for a in product(*lists)]

def g(lists):
    return [''.join(a) for a in product(*[map(str,a)for a in lists])]

def h(lists):
    return list(map(''.join, product(*[map(str,a)for a in lists])))

def k(lists):
    result = ['']
    for lst in lists:
        lst = [*map(str, lst)]
        result = [S + s for S in result for s in lst]
    return result

def k2(lists):
    result = ['']
    for lst in lists:
        result = list(starmap(concat, product(result, map(str, lst))))
    return result

def k3(lists):
    result = ['']
    for lst in lists:
        result = starmap(concat, product(result, map(str, lst)))
    return list(result)

funcs = [car_multiple_sets, f, g, h, k, k2, k3]

a = [1, 2]
b = [3, 4]
c = [6, 7, 8]
lists = [a, b, c]

for func in funcs:
  print(func(lists), func.__name__)

times = {func: [] for func in funcs}
lists *= 3
for _ in range(50):
  shuffle(funcs)
  for func in funcs:
    t = min(repeat(lambda: func(lists), number=1))
    insort(times[func], t)
for func in sorted(funcs, key=times.get):
    print(*('%4d us ' % (t * 1e6) for t in times[func][:3]), func.__name__)

(f and g are from a currently deleted answer, the k functions are from me)

Stef · Accepted Answer · 2022-05-11T12:06:15.567

1

A few comments:

If you think about it, what car_multiple_sets is doing is iterating on its parameter lists. You're doing that using recursion, but iterating on a list can also be done with a for-loop. And it so happens that recursion is somewhat slow and memory-inefficient in python, so for-loops are preferable.
You don't need to convert to str to group the ints together. You can use tuples. That's precisely what they're for. Replace str(x)+str(y) with (x,y) to get a pair of two integers instead of a string.

def car_two_sets(a, b, unpack=False):
    if unpack:
        return [(*x, y) for x in a for y in b]
    else:
        return [(x, y) for x in a for y in b]

def car_multiple_sets(lists):
    if len(lists) == 0:
        return [()]   # empty Cartesian product has one element, the empty tuple
    elif len(lists) == 1:
        return list(zip(lists[0]))   # list of "1-uples" for homogeneity
    else:
        result = car_two_sets(lists[0], lists[1])
        for l in lists[2:]:
            result = car_two_sets(result, l, unpack=True)
        return result

print( car_multiple_sets((range(3), 'abc', range(2))) )
# [(0, 'a', 0), (0, 'a', 1), (0, 'b', 0), (0, 'b', 1), (0, 'c', 0), (0, 'c', 1),
#  (1, 'a', 0), (1, 'a', 1), (1, 'b', 0), (1, 'b', 1), (1, 'c', 0), (1, 'c', 1),
#  (2, 'a', 0), (2, 'a', 1), (2, 'b', 0), (2, 'b', 1), (2, 'c', 0), (2, 'c', 1)]

edited May 11 '22 at 12:06

answered May 11 '22 at 10:38

Stef

13,242
2
17
28

1

@KellyBundy Thank you!! I've been looking for a function like that for a while and it was right under my nose all thise time! Of course `map(tuple, l)` doesn't work because it unpacks iterables. But `zip` with one argument! – Stef May 11 '22 at 12:05
Yeah it's obvious once you see it :-). Btw given that they spoke of "larger number of sets" (and not of "longer sets"), I suspect that your optimization of handling the first two lists differently isn't worth it. Very little gain in the exponentially big picture. – Kelly Bundy May 11 '22 at 12:14
Have you needed the tuple-wrapping often? I only remember using it once ([here](https://stackoverflow.com/a/71520401/12671057)). – Kelly Bundy May 11 '22 at 12:24
Never for anything serious, but more than once when messing with functools.reduce in weird ways. I remember once writing code similar to the one in my answer here, except instead of writing `(*x, y)` to add an element to a tuple, I wrote `(x, y)`, so then I had to apply a tuple-flattening function to the result. Then I started thinking, should the tuples better be nested from the left, or from the right? – Stef May 11 '22 at 12:29
@KellyBundy There you go: [Why is my code for sorting a list via reduce in python throwing an error](https://stackoverflow.com/questions/64225670/why-is-my-code-for-sorting-a-list-via-reduce-in-python-throwing-an-error/64227389#64227389). So with your `zip` suggestion, this becomes `def reduce_sort(seq): return list(functools.reduce(heapq.merge, zip(seq)))`. In practice this is just insertion sort, because `reduce` always go left-to-right. It would be closer to mergesort if there was a way to change the "associativity" of reduce. – Stef May 11 '22 at 12:43
@KellyBundy For instance this would be an iterative mergesort: [Try it online: changing the order in which to reduce](https://tio.run/##TY9LCsMwDET3PoV2icEtCd0VmouULgxxGoF/lR1KevnUnzRUCyM/DSONX@Ps7GXbJnIGMCqKzukAaLyjCCFKMtKzMp2V9K/fxCh6KsZGNdU2JNhqfmWQSsMNNIbYftAnyAt8z6gVaGUTgQH6Ks2FSZ5Xp8HBDov9hLZsEZAdUSDfTUnFhWxV6nv34KwmsYvx65mkHXOsenL@oY0sW@992wnou/JwFhL/z8I8ZYkW0MDpNEAjIPDtCw) – Stef May 11 '22 at 13:01
1

Ah, right, I think I actually once used it for such a merge sort as well. But [like this](https://tio.run/##TY7NCoMwDMfvfYrcbFkdym4D9yJjB8E6A23N0npwL@@i9bAQQvLL15/WPM3xtm0jzwEm19MHMNDMGYLjt1NqcGNJk0DtzV2BmIcOPKasv0gCzQFRIGbHAs6hSwehJ33sW0Dx0mGXF47lgn/W7cuooiAugdYr93GQ4hSyVxiz2n@euW4stM0RjErC/xUq4n3EW6igrh9QWUhm@wE). – Kelly Bundy May 11 '22 at 13:47
1

Or [heapsort](https://tio.run/##TY1LDsIwDET3PoV3TVCKitghwV0qNaWR8jFOumgvH9wAEl5YnufRDG1lSfFa68wp4GJHeqELlLhgsPy0AJOdG8/ClNc3QBm2ZeWI3uWimk@ddkfy1vqTFNdA25nHOIn4Bh7KxQIe779bDQYvQ1sasvC/JiA@HN5gh33/wM5g1vUN). – Kelly Bundy May 11 '22 at 14:37
I just realized your mergesort is [broken](https://tio.run/##TU/LasNADLzvV@gWL5ji0Fug/ZHSg4mVWKB9VCuTOD/v7MMJ1WGZnRlGmrjqHPzntl0kOCBF0RA4AbkYRCHpKG6MpqozjvHvpTiUKxoz4aXBlMmO7clAHoYvYEraPShm0lbyNhMjMPrMwDccm7UMZXtZnYU3947YT@jqlh5KIvVk91BBXcQ3J/8Mv9a0Jn5xcf2Q0U@lVju5/MirKdE77oYejkN9rIlSGH6BA94jnvV06KGUw6k22TXBtHDV/re32xM). It only works when the length is a power of 2, otherwise it drops unpaired chunks. – Kelly Bundy May 12 '22 at 14:53
@KellyBundy Thanks! Fixed with `zip_longest`: [Try it online!](https://tio.run/##VVDLagMxDLz7K3SLDaZs6C2Q/kgIwWS1WYP8qK1tk/781o9NaHUw49FYo3F88Bz8@7pOKTiwjIlDoAzWxZAYMpvkTNTwY@OFgr9hZtGkM5r4@ZQ5TDcUYsSpw1xISeogoBTBEchmlmVGIVUjv2dLCIS@MPAB@y6tZYu87lEaL@41YttHNpd/W0mrrYbJEn0ZWvB4OqvNKiEvyff3dBrOSvSwfnHx8ZaMH2vyHqTerGdRDTcsBw37oR1KxFQZeoId3iNe@bDTUCPj2PJtvYR5odb7@ydq/QU "Python 3 – Try It Online") – Stef May 12 '22 at 15:08
1

Yep, that works. Or just [append one](https://tio.run/##TZDBCsIwDIbvfYrcbGHIhjdBX0Q8DJu5QtrFrEPny8@2m2Io5c@XnzQpz7EfwmFZOhk89NjyA5znQSJ4lDsqZbFb5ZigJnNUkILgBOTGqN@OEzQFPntHCIQhEThDs1pzuGR3ESUVfoz2LTMGq7X5g9@@vmVdnq3ApbNZBOMkYXXQpb6ade4weZ730gabkm38nLkQVe64aV1X0NTlMoolE/qKHb4Yb/G4qyAvirZstdUEx4lK7f8nzPIB). Btw note that `starmap`+`zip` seems rather clumsy, since you can just use `map` :-P – Kelly Bundy May 12 '22 at 15:25

Improving the performance of cartesian product of multiple lists

2 Answers2