0

I have three ranges and want to create a 3 column array with every possible combination of these ranges and for it to be in a specific order. I know how to do this with a loop. However, in reality the data will have way more than 3 columns and the ranges are very large so I think a loop will be inefficient and would like a fast way of doing this. The real dataset size will be approximately 5 GB so efficiency is key for me. As an example:

inc = 1
a = np.arange(1001,1002+inc,inc)
b = np.arange(1,3+inc,inc)
c = np.arange(1,5+inc,inc)

I want to create an output that looks like:

array([[1001,    1,    1],
       [1001,    1,    2],
       [1001,    1,    3],
       [1001,    1,    4],
       [1001,    1,    5],
       [1001,    2,    1],
       [1001,    2,    2],
       [1001,    2,    3],
       [1001,    2,    4],
       [1001,    2,    5],
       [1001,    3,    1],
       [1001,    3,    2],
       [1001,    3,    3],
       [1001,    3,    4],
       [1001,    3,    5],
       [1002,    1,    1],
       [1002,    1,    2],
       [1002,    1,    3],

This output is not complete but it shows what I want. I should add that I am doing this because I have an input table of the same format but with missing rows and I want to be able to identify the missing rows by comparing the input dataset to this 'ideal' table. As mentioned above, I can do this with a for loop but want to find a more Pythonic way of doing it if possible.

  • 2
    I think that the python's [`itertools.product()`](https://docs.python.org/3/library/itertools.html#itertools.product) is what you are looking for – Ahmed Elashry Feb 05 '22 at 15:45
  • 1
    Does this answer your question? [How to get all possible combinations of a list’s elements?](https://stackoverflow.com/questions/464864/how-to-get-all-possible-combinations-of-a-list-s-elements) – Ahmed Elashry Feb 05 '22 at 15:47

2 Answers2

0

You can do it easily with the built-in itertools.product:

import itertools as it

perms = np.array(list(it.product(a, b, c)))

Output:

>>> perms
array([[1001,    1,    1],
       [1001,    1,    2],
       [1001,    1,    3],
       [1001,    1,    4],
       [1001,    1,    5],
       [1001,    2,    1],
       [1001,    2,    2],
       [1001,    2,    3],
       [1001,    2,    4],
       [1001,    2,    5],
       [1001,    3,    1],
       [1001,    3,    2],
       [1001,    3,    3],
       [1001,    3,    4],
       [1001,    3,    5],
       [1002,    1,    1],
       [1002,    1,    2],
       [1002,    1,    3],
       [1002,    1,    4],
       [1002,    1,    5],
       [1002,    2,    1],
       [1002,    2,    2],
       [1002,    2,    3],
       [1002,    2,    4],
       [1002,    2,    5],
       [1002,    3,    1],
       [1002,    3,    2],
       [1002,    3,    3],
       [1002,    3,    4],
       [1002,    3,    5]])
0

I recommend using numpy.meshgrid because it runs significantly faster.

>>> np.array(np.meshgrid(a,b,c)).T.reshape((-1, 3))
array([[1001,    1,    1],
       [1001,    2,    1],
       [1001,    3,    1],
       [1002,    1,    1],
       [1002,    2,    1],
       [1002,    3,    1],
       [1001,    1,    2],
       [1001,    2,    2],
       [1001,    3,    2],
       [1002,    1,    2],

If order is important, this seems to do it.

np.array([m.flatten() for m in np.meshgrid(a,b,c, indexing='ij')]).T
ken
  • 1,543
  • 1
  • 2
  • 14
  • 1
    Cool! You can also just re-index the array in your first version to get the right order: `np.array(np.meshgrid(a,b,c)).T.reshape(-1, 3)[:, [0, 2, 1]]` - but that's not very dynamic. –  Feb 05 '22 at 16:24