Cartesian product of rows of a very big array

Question

I have an array of size (100, 50). I need to generate an output array which represents a cartesian product of input array rows.

For simplification purposes, let's have an input array:

array([[2, 6, 5],
       [7, 3, 6]])

As output I would like to have:

array([[2, 7],
       [2, 3],
       [2, 6],
       [6, 7],
       [6, 3],
       [6, 6],
       [5, 7],
       [5, 3],
       [5, 6]])

Note: itertools.product doesn't work here, because of the size of the input vector. Also all another similar answers, assumes number of rows smaller than 32, what is not the case here

You want to create an array of size (50^100, 100), which is far more than huge. If you tell us what you are trying to achieve, I'm quite sure there will be a different solution. — Lante Dellarovere, May 31 '19 at 08:45
@LanteDellarovere your answer changes my understanding of our possibilities. So basically I was trying to do a kind of grid search, by applying a classification model to this array of all possible inputs. Probably should look to something like hyperopt.github.io/ — zhukovgreen, May 31 '19 at 09:11
it's an option. In order to get an exhaustive answer, I rather open a new question here on SO or https://stats.stackexchange.com/ on this machine learning topic — Lante Dellarovere, May 31 '19 at 09:46

score 1 · Accepted Answer · edited May 31 '19 at 13:41

1

This question has been asked many times, for example here.
The array of a size (100, 50) is too big and can't be handled by numpy. However, smaller array size might be solved. Anyway, I prefer to use itertools for this kind of stuff:

import itertools

a = np.array([[2, 6, 5], [7, 3, 6]])

np.array(list(itertools.product(*a)))
array([[2, 7],
       [2, 3],
       [2, 6],
       [6, 7],
       [6, 3],
       [6, 6],
       [5, 7],
       [5, 3],
       [5, 6]])

edited May 31 '19 at 13:41

zhukovgreen

1,551
16
26

answered May 31 '19 at 08:03

Lante Dellarovere

1,838
2
7
10

With the array of a (50, 100) shape this going to stick forevere – zhukovgreen May 31 '19 at 08:07
And it is not the same question, since all the examples doesn’t assumes the number of arrays bigger than 32 – zhukovgreen May 31 '19 at 08:11
@ArtemZhukov Sorry, didn't pay attention to the dimension of your array. I suspect no computer can deal with these numbers. Have you tried to compute the dimension of the cartesian product matrix? – Lante Dellarovere May 31 '19 at 08:23
2

@ArtemZhukov I accepted your edit, but I have to specify that "can't be handled by numpy" can be misleading, as it does not get the real point. 100*50^100 numbers are way way more than all the electrons in the universe, and calculate those combinations would take way more than the age of the universe, no matter what numpy, itertools or super quantum computer you are going to use – Lante Dellarovere May 31 '19 at 16:11

score 0 · Answer 2 · answered May 31 '19 at 08:00

0

a = np.array([[2, 6, 5],[7, 3, 6]])

out = np.array(np.meshgrid(a[0], a[1])).T.reshape(-1,2)
print(out)

"""
prints
[[2 7]
 [2 3]
 [2 6]
 [6 7]
 [6 3]
 [6 6]
 [5 7]
 [5 3]
 [5 6]]
"""

answered May 31 '19 at 08:00

pmarcol

453
2
9

What if the input has shape (50, 100) ? – zhukovgreen May 31 '19 at 08:08
Well, you've got the point - numpy can't cope with such dimensions. – pmarcol May 31 '19 at 08:38

Cartesian product of rows of a very big array

2 Answers2