4

I have a n-dimension array as shown below:

np.array([[0,3],[0,3],[0,10]])

In this array, the elements denote the low and high values. Ex: [0,3] refers to [0,1,2,3]

I need to generate a combination of all values using the ranges given as above. For example, I want [0,0,0], [0,0,1] ... [0,1,0] ... [3,3,10]

I have tried the following to get what I want:

ds = np.array([[0,3],[0,3],[0,10]])
nItems = int(reduce(lambda a,b: a * (b[1] - b[0] + 1), ds, 1))
myCombinations = np.zeros((nItems,))
nArrays = []
for x in range(ds.shape[0]):
    low = ds[x][0]
    high= ds[x][1]
    nitm = high - low + 1
    ar = [x+low for x in range(nitm) ]
    nArrays.append(ar)

myCombinations = cartesian(nArrays)

The cartesian function was taken from Using numpy to build an array of all combinations of two arrays

I need to do this few million times.

My question: is there any better / efficient way to do this?

Community
  • 1
  • 1
okkhoy
  • 1,298
  • 3
  • 16
  • 29

2 Answers2

25

I think what you're looking for is np.mgrid. Unfortunately, this returns the array in a format that's different from what you need, so you'll need to do a little post-processing:

a = np.mgrid[0:4, 0:4, 0:11]     # All points in a 3D grid within the given ranges
a = np.rollaxis(a, 0, 4)         # Make the 0th axis into the last axis
a = a.reshape((4 * 4 * 11, 3))   # Now you can safely reshape while preserving order

Explanation

np.mgrid gives you a set of grid points in N-dimensional space. Let me try to show this with a smaller example, to make things clearer:

>>> a = np.mgrid[0:2, 0:2]
>>> a
array([[[0, 0],
        [1, 1]],

       [[0, 1],
        [0, 1]]])

Since I've given two sets of ranges, 0:2, 0:2, I get a 2D grid. What mgrid returns is the x-values and the y-values corresponding to the grid points (0, 0), (0, 1), (1, 0) and (1, 1) in 2D space. a[0] tells you what the x-values of the four points are, and a[1] tells you what the y-values are.

But what you really want is that list of actual grid points that I've written out, not the x- and y-values of those points separately. First instinct is to just reshape the array as desired:

>>> a.reshape((4, 2))
array([[0, 0],
       [1, 1],
       [0, 1],
       [0, 1]])

But clearly this doesn't work, because it effectively reshapes the flattened array (the array obtained by just reading all elements in order), and that's not what you want.

What you want to do is to look down the third dimension of a, and create an array:

[ [a[0][0, 0], a[1][0, 0]],
  [a[0][0, 1], a[1][0, 1]],
  [a[0][1, 0], a[1][1, 0]],
  [a[0][1, 1], a[1][1, 1]] ]

which reads "First tell me the first point (x1, y1), then the second point (x2, y2), ..." and so on. Perhaps this is better explained with a figure, of sorts. This is what a looks like:

                you want to read
                in this direction
                 (0, 0)   (0, 1)
                   |        |
                   |        |
                   v        v

          /        0--------0            +----> axis0
 x-values |       /|       /|           /|
          |      / |      / |    axis1 / |
          \     1--------1  |         L  |
                |  |     |  |            v
          /     |  0-----|--1           axis2
 y-values |     | /      | /
          |     |/       |/
          \     0--------1

                |        |
                |        |
                v        v
              (1, 0)   (1, 1)

np.rollaxis gives you a way to do this. np.rollaxis(a, 0, 3) in the above example says "take the 0th (or outermost) axis and make it into the last (or innermost) axis. (Note: only axes 0, 1 and 2 actually exist here. So saying "send the 0th axis to the 3rd position" is a way of telling python to put the 0th axis after the last axis). You might also want to read this.

>>> a = np.rollaxis(a, 0, 3)
>>> a
array([[[0, 0],
        [0, 1]],

       [[1, 0],
        [1, 1]]])

This is starting to look like what you want, except there's an extra array dimension. We want to merge dimensions 0 and 1 to get just get a single array of grid points. But now that the flattened array reads in the manner that you expect, you can safely reshape it to give you the desired result.

>>> a = a.reshape((4, 2))
>>> a
array([[0, 0],
       [0, 1],
       [1, 0],
       [1, 1]])

The 3D version does just the same thing, except, I couldn't make a figure for that, since it'd be in 4D.

Praveen
  • 6,872
  • 3
  • 43
  • 62
  • This is quite efficient (100000 runs takes about 4 seconds), but its quite confusing, could you please explain how it works? (or please point me to some doc where I can understand this?) – okkhoy Dec 04 '14 at 10:34
  • I've added an explanation for your benefit, but on my computer, `itertools.product` actually runs about 6 times faster. The bulk of the time in my method is consumed by `mgrid` itself, so you can't even get out of it by avoiding `rollaxis` and `reshape`. Out of curiosity, what versions of Python and numpy are you using? – Praveen Dec 05 '14 at 04:04
  • I just realized that another way of achieving the `rollaxis`+`reshape` effect, but losing out on numpy-ness in the process, is to use `zip(a[0].flatten(), a[1].flatten(), a[2].flatten())`. – Praveen Dec 05 '14 at 04:18
  • wow! thanks for the explanation! i m running python 2.7.6 and numpy 1.8.1, i checked again, results are similar on my machine. itertools take longer! – okkhoy Dec 05 '14 at 06:58
3

You can use itertools.product:

In [16]: from itertools import product

In [17]: values = list(product(range(4), range(4), range(11)))

In [18]: values[:5]
Out[18]: [(0, 0, 0), (0, 0, 1), (0, 0, 2), (0, 0, 3), (0, 0, 4)]

In [19]: values[-5:]
Out[19]: [(3, 3, 6), (3, 3, 7), (3, 3, 8), (3, 3, 9), (3, 3, 10)]

Given the array of ranges, you can do something like the following. (I used a couple non-zero low values to demonstrate the general case--and to cut down the size of the output. :)

In [41]: ranges = np.array([[0, 3], [1, 3], [8, 10]])

In [42]: list(product(*(range(lo, hi+1) for lo, hi in ranges)))
Out[42]: 
[(0, 1, 8),
 (0, 1, 9),
 (0, 1, 10),
 (0, 2, 8),
 (0, 2, 9),
 (0, 2, 10),
 (0, 3, 8),
 (0, 3, 9),
 (0, 3, 10),
 (1, 1, 8),
 (1, 1, 9),
 (1, 1, 10),
 (1, 2, 8),
 (1, 2, 9),
 (1, 2, 10),
 (1, 3, 8),
 (1, 3, 9),
 (1, 3, 10),
 (2, 1, 8),
 (2, 1, 9),
 (2, 1, 10),
 (2, 2, 8),
 (2, 2, 9),
 (2, 2, 10),
 (2, 3, 8),
 (2, 3, 9),
 (2, 3, 10),
 (3, 1, 8),
 (3, 1, 9),
 (3, 1, 10),
 (3, 2, 8),
 (3, 2, 9),
 (3, 2, 10),
 (3, 3, 8),
 (3, 3, 9),
 (3, 3, 10)]

If the low values of all the ranges are 0, you can use np.ndindex:

In [52]: values = list(np.ndindex(4, 4, 11))

In [53]: values[:5]
Out[53]: [(0, 0, 0), (0, 0, 1), (0, 0, 2), (0, 0, 3), (0, 0, 4)]

In [54]: values[-5:]
Out[34]: [(3, 3, 6), (3, 3, 7), (3, 3, 8), (3, 3, 9), (3, 3, 10)]
Warren Weckesser
  • 110,654
  • 19
  • 194
  • 214
  • No, all the low values are not 0, hence I don't think I can use `np.ndindex`. The other method works for me. I can just convert it to a numpy array once I have the list of tuples. Thanks!! – okkhoy Dec 04 '14 at 10:12
  • I just noticed, running the method for 100000 times, my approach gives the result in 9 seconds while using the itertools needs 44 seconds. This method is much simpler to code, but I was looking at efficiency since I have to do it a few million times. – okkhoy Dec 04 '14 at 10:15