202

I'm trying to run over the parameters space of a six-parameter function to study its numerical behavior before trying to do anything complex with it, so I'm searching for an efficient way to do this.

My function takes float values given in a 6-dim NumPy array as input. What I tried to do initially was this:

First, I created a function that takes two arrays and generate an array with all combinations of values from the two arrays:

from numpy import *

def comb(a, b):
    c = []
    for i in a:
        for j in b:
            c.append(r_[i,j])
    return c

Then, I used reduce() to apply that to m copies of the same array:

def combs(a, m):
    return reduce(comb, [a]*m)

Finally, I evaluate my function like this:

values = combs(np.arange(0, 1, 0.1), 6)
for val in values:
    print F(val)

This works, but it's way too slow. I know the space of parameters is huge, but this shouldn't be so slow. I have only sampled 106 (a million) points in this example and it took more than 15 seconds just to create the array values.

Is there a more efficient way of doing this with NumPy?

I can modify the way the function F takes its arguments if it's necessary.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Rafael S. Calsaverini
  • 13,582
  • 19
  • 75
  • 132
  • For the fastest cartesian product I've found, see [this answer](https://stackoverflow.com/a/11146645/577088). (Since the question is phrased quite differently from this one, I deem that the questions are not duplicates, but the best solution to the two questions is the same.) – senderle Jul 17 '17 at 14:50

10 Answers10

203

In newer versions of NumPy (>1.8.x), numpy.meshgrid() provides a much faster implementation:

For pv's solution:

In [113]:

%timeit cartesian(([1, 2, 3], [4, 5], [6, 7]))
10000 loops, best of 3: 135 µs per loop
In [114]:

cartesian(([1, 2, 3], [4, 5], [6, 7]))

Out[114]:
array([[1, 4, 6],
       [1, 4, 7],
       [1, 5, 6],
       [1, 5, 7],
       [2, 4, 6],
       [2, 4, 7],
       [2, 5, 6],
       [2, 5, 7],
       [3, 4, 6],
       [3, 4, 7],
       [3, 5, 6],
       [3, 5, 7]])

numpy.meshgrid() used to be two-dimensional only, but now it is capable of being multidimensional. In this case, three-dimensional:

In [115]:

%timeit np.array(np.meshgrid([1, 2, 3], [4, 5], [6, 7])).T.reshape(-1,3)
10000 loops, best of 3: 74.1 µs per loop
In [116]:

np.array(np.meshgrid([1, 2, 3], [4, 5], [6, 7])).T.reshape(-1,3)

Out[116]:
array([[1, 4, 6],
       [1, 5, 6],
       [2, 4, 6],
       [2, 5, 6],
       [3, 4, 6],
       [3, 5, 6],
       [1, 4, 7],
       [1, 5, 7],
       [2, 4, 7],
       [2, 5, 7],
       [3, 4, 7],
       [3, 5, 7]])

Note that the order of the final result is slightly different.

M Z
  • 4,571
  • 2
  • 13
  • 27
CT Zhu
  • 52,648
  • 17
  • 120
  • 133
  • 29
    `np.stack(np.meshgrid([1, 2, 3], [4, 5], [6, 7]), -1).reshape(-1, 3)` will give the right order – Eric Jun 04 '17 at 14:30
  • @CT Zhu Is there an easy way to transform this so that the a matrix holding the different arrays as columns is used as input instead? – Dole Jul 04 '19 at 08:58
  • 2
    It should be noted that meshgrid only works for smaller range sets, I have a large one and I get error: ValueError: maximum supported dimension for an ndarray is 32, found 69 – mikkom Oct 20 '19 at 06:54
  • 2
    @mikkom, nothing will handle sets greater than 32. Even if each was of size 2, the number of combinations would be 2**32, 4 Gb. – hpaulj Mar 10 '21 at 17:48
186

Here's a pure-NumPy implementation. It's about 5 times faster than using itertools.

Python 3:

import numpy as np

def cartesian(arrays, out=None):
    """
    Generate a Cartesian product of input arrays.

    Parameters
    ----------
    arrays : list of array-like
        1-D arrays to form the Cartesian product of.
    out : ndarray
        Array to place the Cartesian product in.

    Returns
    -------
    out : ndarray
        2-D array of shape (M, len(arrays)) containing Cartesian products
        formed of input arrays.

    Examples
    --------
    >>> cartesian(([1, 2, 3], [4, 5], [6, 7]))
    array([[1, 4, 6],
           [1, 4, 7],
           [1, 5, 6],
           [1, 5, 7],
           [2, 4, 6],
           [2, 4, 7],
           [2, 5, 6],
           [2, 5, 7],
           [3, 4, 6],
           [3, 4, 7],
           [3, 5, 6],
           [3, 5, 7]])

    """

    arrays = [np.asarray(x) for x in arrays]
    dtype = arrays[0].dtype

    n = np.prod([x.size for x in arrays])
    if out is None:
        out = np.zeros([n, len(arrays)], dtype=dtype)

    #m = n / arrays[0].size
    m = int(n / arrays[0].size)
    out[:,0] = np.repeat(arrays[0], m)
    if arrays[1:]:
        cartesian(arrays[1:], out=out[0:m, 1:])
        for j in range(1, arrays[0].size):
        #for j in xrange(1, arrays[0].size):
            out[j*m:(j+1)*m, 1:] = out[0:m, 1:]
    return out

Python 2:


import numpy as np

def cartesian(arrays, out=None):
    arrays = [np.asarray(x) for x in arrays]
    dtype = arrays[0].dtype

    n = np.prod([x.size for x in arrays])
    if out is None:
        out = np.zeros([n, len(arrays)], dtype=dtype)

    m = n / arrays[0].size
    out[:,0] = np.repeat(arrays[0], m)
    if arrays[1:]:
        cartesian(arrays[1:], out=out[0:m, 1:])
        for j in xrange(1, arrays[0].size):
            out[j*m:(j+1)*m, 1:] = out[0:m, 1:]
    return out
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
pv.
  • 33,875
  • 8
  • 55
  • 49
  • Nice Pauli, this solves my 2D interpolation problem. Defining the data point coords for griddata was giving some trouble. Does this function make into the master numpy code? – Gökhan Sever Sep 11 '11 at 00:35
  • why not creat `out` with `np.ndarray`, that saves time. – steabert Sep 20 '11 at 07:09
  • 62
    ever consider submitting this to be included in numpy? this is not the first time I've gone looking for this functionality and found your post. – endolith Apr 12 '13 at 14:31
  • 1
    There is bug in this implementation. For arrays of strings for example: arrays[0].dtype = "|S3" and arrays[1].dtype = "|S5". So there is a need in finding the longest string in input and use its type in out = np.zeros([n, len(arrays)], dtype=dtype) – norecces Jun 20 '13 at 08:18
  • 49
    FYI: seems to have made it into the scikit-learn package at `from sklearn.utils.extmath import cartesian` – Gus Sep 13 '13 at 04:27
  • 1
    I get memory error when I try to run: `a = arange(3333, dtype=int16) cartesian(split(a, 3))`. This should use about 1111**3 * 16 bits of memory, ~2.6GB. Maybe I don't actually have enough free memory.. But is there much memory overhead for this method? – Joelmob Sep 29 '14 at 16:29
  • 1
    Not only is this faster, but it uses a third of the memory (at least in the case of how I was using itertools.combinations) – David Marx Oct 30 '14 at 16:53
  • 3
    I just realized: this is slightly different from itertools.combinations, as this function respects the ordering of values whereas combinations doesn't, so this function returns more values than combinations does. Still very impressive, but unfortunately not what I was looking for :( – David Marx Oct 30 '14 at 17:08
  • 2
    For posterity, the performant alternative to just using itertools.combinations can be found here: http://stackoverflow.com/questions/16003217/n-d-version-of-itertools-combinations-in-numpy – David Marx Oct 30 '14 at 17:32
  • What if the results donnot fit in memory ? shouldn't this have the option to return a generator as well ? – RetroCode Oct 14 '16 at 21:16
  • 1
    For the fastest cartesian product I've found, see [this answer](https://stackoverflow.com/a/11146645/577088). – senderle Jul 17 '17 at 14:52
  • 9
    `TypeError: slice indices must be integers or None or have an __index__ method` thrown by `cartesian(arrays[1:], out=out[0:m,1:])` – Boern Sep 25 '17 at 15:48
  • 2
    @Boern or anyone else running into a `TypeError`, the function was written for python 2, and you are likely trying to run it with python 3. you can change two lines: `m = n / arrays[0].size` --> `m = int(n / arrays[0].size)` and `for j in xrange(1, arrays[0].size):` --> `for j in range(1, arrays[0].size):` – barlaensdoonn Jan 28 '19 at 22:53
  • @Boern or you can use the function `cartesian_product_recursive` (or any of the other solutions) from @senderle's answer to this question: https://stackoverflow.com/questions/11144513/numpy-cartesian-product-of-x-and-y-array-points-into-single-array-of-2d-points/11146645#11146645 IMO this is a better solution than having to potentially install and import a large library like `sklearn` – barlaensdoonn Jan 28 '19 at 22:54
  • 1
    @barlaensdoonn Thanks to your comment the answer was edited. – questionto42 Jul 18 '21 at 16:48
  • Won't `np.empty` be much faster in case `if out is None:`? – Gulzar Sep 10 '22 at 14:42
42

itertools.combinations is in general the fastest way to get combinations from a Python container (if you do in fact want combinations, i.e., arrangements without repetitions and independent of order; that's not what your code appears to be doing, but I can't tell whether that's because your code is buggy or because you're using the wrong terminology).

If you want something different than combinations perhaps other iterators in itertools, product or permutations, might serve you better. For example, it looks like your code is roughly the same as:

for val in itertools.product(np.arange(0, 1, 0.1), repeat=6):
    print F(val)

All of these iterators yield tuples, not lists or NumPy arrays, so if your F is picky about getting specifically a NumPy array, you'll have to accept the extra overhead of constructing or clearing and refilling one at each step.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Alex Martelli
  • 854,459
  • 170
  • 1,222
  • 1,395
18

You can use np.array(itertools.product(a, b)).

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
William Song
  • 577
  • 6
  • 17
  • 6
    np.array(list(itertools.product(l, l2))) – ZirconCode Nov 14 '18 at 14:35
  • An explanation would be in order. Isn't it already covered in previous answers? What is different? What is the idea/gist? From [the Help Center](https://stackoverflow.com/help/promotion): *"...always explain why the solution you're presenting is appropriate and how it works"*. Please respond by [editing (changing) your answer](https://stackoverflow.com/posts/49870877/edit), not here in comments (****** ***without*** ****** "Edit:", "Update:", or similar - the answer should appear as if it was written today). – Peter Mortensen Jan 13 '23 at 14:02
12

You can do something like this

import numpy as np

def cartesian_coord(*arrays):
    grid = np.meshgrid(*arrays)
    coord_list = [entry.ravel() for entry in grid]
    points = np.vstack(coord_list).T
    return points

a = np.arange(4)  # Fake data
print(cartesian_coord(*6*[a])

which gives

array([[0, 0, 0, 0, 0, 0],
   [0, 0, 0, 0, 0, 1],
   [0, 0, 0, 0, 0, 2],
   ...,
   [3, 3, 3, 3, 3, 1],
   [3, 3, 3, 3, 3, 2],
   [3, 3, 3, 3, 3, 3]])
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
felippe
  • 353
  • 3
  • 7
  • 3
    Is there a way to get NumPy to accept more than 32 arrays for meshgrid? This method works for me as long as I don't pass more than 32 arrays. – Joelmob Sep 29 '14 at 16:26
11

The following NumPy implementation should be approximately two times the speed of the given previous answers:

def cartesian2(arrays):
    arrays = [np.asarray(a) for a in arrays]
    shape = (len(x) for x in arrays)

    ix = np.indices(shape, dtype=int)
    ix = ix.reshape(len(arrays), -1).T

    for n, arr in enumerate(arrays):
        ix[:, n] = arrays[n][ix[:, n]]

    return ix
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Stefan van der Walt
  • 7,165
  • 1
  • 32
  • 41
  • 2
    Looks good. By my rudimentary tests, this looks faster than the original answer for all pairs, triples, and 4-tuples of {1,2,...,100}. After that, the original answer wins. Also, for future readers looking to generate all k-tuples of {1,...,n}, `np.indices((n,...,n)).reshape(k,-1).T` will do. – jme Sep 18 '14 at 15:35
  • 2
    This only works for integers, while the accepted answer also works for floats. – FJC May 20 '16 at 16:35
10

It looks like you want a grid to evaluate your function, in which case you can use numpy.ogrid (open) or numpy.mgrid (fleshed out):

import numpy

my_grid = numpy.mgrid[[slice(0, 1, 0.1)]*6]
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
steabert
  • 6,540
  • 2
  • 26
  • 32
8

Here's yet another way, using pure NumPy, no recursion, no list comprehension, and no explicit for loops. It's about 20% slower than the original answer, and it's based on np.meshgrid.

def cartesian(*arrays):
    mesh = np.meshgrid(*arrays)  # Standard NumPy meshgrid
    dim = len(mesh)  # Number of dimensions
    elements = mesh[0].size  # Number of elements, any index will do
    flat = np.concatenate(mesh).ravel()  # Flatten the whole meshgrid
    reshape = np.reshape(flat, (dim, elements)).T  # Reshape and transpose
    return reshape

For example,

x = np.arange(3)
a = cartesian(x, x, x, x, x)
print(a)

gives

[[0 0 0 0 0]
 [0 0 0 0 1]
 [0 0 0 0 2]
 ...,
 [2 2 2 2 0]
 [2 2 2 2 1]
 [2 2 2 2 2]]
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
étale-cohomology
  • 2,098
  • 2
  • 28
  • 33
7

For a pure NumPy implementation of the Cartesian product of one-dimensional arrays (or flat Python lists), just use meshgrid(), roll the axes with transpose(), and reshape to the desired output:

 def cartprod(*arrays):
     N = len(arrays)
     return transpose(meshgrid(*arrays, indexing='ij'),
                      roll(arange(N + 1), -1)).reshape(-1, N)

Note this has the convention of the last axis changing the fastest ("C style" or "row-major").

In [88]: cartprod([1,2,3], [4,8], [100, 200, 300, 400], [-5, -4])
Out[88]:
array([[  1,   4, 100,  -5],
       [  1,   4, 100,  -4],
       [  1,   4, 200,  -5],
       [  1,   4, 200,  -4],
       [  1,   4, 300,  -5],
       [  1,   4, 300,  -4],
       [  1,   4, 400,  -5],
       [  1,   4, 400,  -4],
       [  1,   8, 100,  -5],
       [  1,   8, 100,  -4],
       [  1,   8, 200,  -5],
       [  1,   8, 200,  -4],
       [  1,   8, 300,  -5],
       [  1,   8, 300,  -4],
       [  1,   8, 400,  -5],
       [  1,   8, 400,  -4],
       [  2,   4, 100,  -5],
       [  2,   4, 100,  -4],
       [  2,   4, 200,  -5],
       [  2,   4, 200,  -4],
       [  2,   4, 300,  -5],
       [  2,   4, 300,  -4],
       [  2,   4, 400,  -5],
       [  2,   4, 400,  -4],
       [  2,   8, 100,  -5],
       [  2,   8, 100,  -4],
       [  2,   8, 200,  -5],
       [  2,   8, 200,  -4],
       [  2,   8, 300,  -5],
       [  2,   8, 300,  -4],
       [  2,   8, 400,  -5],
       [  2,   8, 400,  -4],
       [  3,   4, 100,  -5],
       [  3,   4, 100,  -4],
       [  3,   4, 200,  -5],
       [  3,   4, 200,  -4],
       [  3,   4, 300,  -5],
       [  3,   4, 300,  -4],
       [  3,   4, 400,  -5],
       [  3,   4, 400,  -4],
       [  3,   8, 100,  -5],
       [  3,   8, 100,  -4],
       [  3,   8, 200,  -5],
       [  3,   8, 200,  -4],
       [  3,   8, 300,  -5],
       [  3,   8, 300,  -4],
       [  3,   8, 400,  -5],
       [  3,   8, 400,  -4]])

If you want to change the first axis fastest ("Fortran style" or "column-major"), just change the order parameter of reshape() like this: reshape((-1, N), order='F')

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
RBF06
  • 2,013
  • 2
  • 21
  • 20
2

Pandas' merge() offers a naive, fast solution to the problem:

# Given the lists
x, y, z = [1, 2, 3], [4, 5], [6, 7]

# Get dataframes with the same, constant index 
x = pd.DataFrame({'x': x}, index=np.repeat(0, len(x)))
y = pd.DataFrame({'y': y}, index=np.repeat(0, len(y)))
z = pd.DataFrame({'z': z}, index=np.repeat(0, len(z)))

# Get all permutations stored in a new dataframe
df = pd.merge(x, pd.merge(y, z, left_index=True, right_index=True),
              left_index=True, right_index=True)
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
simone
  • 148
  • 2
  • 7