0

I have a set of N arrays and I need to compute the N-fold Cartesian product. Of all the generated ordered tuples I want to keep only the ones whose sum is equal to 1 (plus or minus a certain tolerance), e.g.

p = [[0.4,0.389],
     [0.6,0.611]]

cartesian_product = [[0.4, 0.6],
                     [0.4, 0.611],
                     [0.389, 0.6],
                     [0.389, 0.611]]

filtered_cartesian_product = [[0.4, 0.6],
                              [0.389, 0.611]]

I solved this problem in Python using Itertools' Product()

np.array([i for i in itertools.product(*p) if np.abs(np.sum(i)-1)<1e-4])

which works quite well. However, since I need to perform this task on a relatively big dataset (100 arrays with 100 entries each), speed and good memory management are crucial.

Could anyone suggest me a faster solution (not necessarily in Python)?

Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76
SteRinaldi
  • 40
  • 8
  • 1
    As your proposed code already uses numpy, the answer is in [this other SO post](https://stackoverflow.com/q/11144513/3545273). Ping me in a comment if it does not solve your problem (and explain why) – Serge Ballesta Sep 21 '21 at 08:12
  • 2
    What you ask is simply impossible. Computing a cartesian product of a `100x100` array will result in a `100**100 = 10**200` combination. This is more that what any computer can do during the age of the universe so far. Not to mention there are probably not one unique solution but probably billion billion billion ones or more and so the result cannot be stored on any computer. You certainly need to change your expectation/goal. – Jérôme Richard Sep 21 '21 at 08:29
  • The duplicate's answers create all products, but given the potential memory issues with do that, your use of a filtered `itertools.product` generator may well be the fastest, memory-efficient method. – hpaulj Sep 21 '21 at 20:54

0 Answers0