0

I am somewhat new to numpy and am having trouble figuring out a nice way to efficiently perform what I assume is likely a simple task. I am suspicious there is a direct way to do this in numpy, but having searched quite a bit could not find anything that does it directly.

I have two 2D arrays, like so:

>>> ident2 = np.identity(2)
>>> ident3 = np.identity(3)
>>> ident2
array([[1., 0.],
       [0., 1.]])
>>> ident3
array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

What I would like to create is an array like this, which is the cartesian product of the two arrays above but concatenated along the rows:

array([[1, 0, 0, 1, 0],
       [1, 0, 0, 0, 1],
       [0, 1, 0, 1, 0],
       [0, 1, 0, 0, 1],
       [0, 0, 1, 1, 0],
       [0, 0, 1, 0, 1]])

So far I have been able to create the cartesian product using itertools.product like this:

>>> x=np.array([*itertools.product(ident2, ident3)])
>>> x
array([[array([1., 0.]), array([1., 0., 0.])],
       [array([1., 0.]), array([0., 1., 0.])],
       [array([1., 0.]), array([0., 0., 1.])],
       [array([0., 1.]), array([1., 0., 0.])],
       [array([0., 1.]), array([0., 1., 0.])],
       [array([0., 1.]), array([0., 0., 1.])]], dtype=object)

But I am having trouble figuring out a readable, efficient way to join the arrays along the rows into a final array. This works:

>>> np.stack([np.concatenate(arrays) for arrays in x])
array([[1., 0., 0., 1., 0.],
       [1., 0., 0., 0., 1.],
       [0., 1., 0., 1., 0.],
       [0., 1., 0., 0., 1.],
       [0., 0., 1., 1., 0.],
       [0., 0., 1., 0., 1.]])

The above is very readable, but since it does not use only native numpy methods and uses a list comprehension, I assume it will be slow.

Below is the only method I've found that works without using a list comprehension:

>>> np.stack(np.array_split(np.hstack(np.concatenate(x)), 6))
array([[1., 0., 0., 1., 0.],
       [1., 0., 0., 0., 1.],
       [0., 1., 0., 1., 0.],
       [0., 1., 0., 0., 1.],
       [0., 0., 1., 1., 0.],
       [0., 0., 1., 0., 1.]])

But it is extremely convoluted. How in the world can future me ever come back to read that and understand what in the world is going on? And it also requires the separate, initial itertools.product step, which I am assuming a more efficient native numpy method would probably not require.

There has to be a better way. What would be the canonical way to construct the row-by-row concatenated cartesian product of these two 2D arrays?

Rick
  • 43,029
  • 15
  • 76
  • 119
  • If you want to read back in the future and understand what you codes, add comment that explain your method, not just a line of code. – Ptit Xav Sep 27 '21 at 16:41
  • @PtitXav I understand and it's good advice. However, my assumption is this way of doing it is so convoluted and hard to understand (I only happened upon it by accident), it is a "code smell" that there is probably a better way to do it. Hence the question. – Rick Sep 27 '21 at 16:42
  • Goggling I found that this [post](https://stackoverflow.com/questions/1208118/using-numpy-to-build-an-array-of-all-combinations-of-two-arrays/1235363#1235363) may contains helpful information . – Ptit Xav Sep 27 '21 at 16:56
  • @PtitXav Yup I've been reading that one; it's related and kind of close. But I am going cross-eyed trying to adapt the AA to do what I need to do. – Rick Sep 27 '21 at 17:25
  • `array_split` does use a list iteration. It makes a list of slices.. But so do `stack`, `hstack` and `concatenate`. They all treat their argument as a list of arrays. – hpaulj Sep 27 '21 at 19:41

2 Answers2

2

How about using a mix of repeat and tile (which itself uses repeat):

In [75]: >>> ident2 = np.identity(2)
    ...: >>> ident3 = np.identity(3)
In [76]: np.repeat(ident3,repeats=2,axis=0)
Out[76]: 
array([[1., 0., 0.],
       [1., 0., 0.],
       [0., 1., 0.],
       [0., 1., 0.],
       [0., 0., 1.],
       [0., 0., 1.]])
In [77]: np.tile(ident2,(3,1))
Out[77]: 
array([[1., 0.],
       [0., 1.],
       [1., 0.],
       [0., 1.],
       [1., 0.],
       [0., 1.]])
In [78]: np.hstack((__,_))
Out[78]: 
array([[1., 0., 0., 1., 0.],
       [1., 0., 0., 0., 1.],
       [0., 1., 0., 1., 0.],
       [0., 1., 0., 0., 1.],
       [0., 0., 1., 1., 0.],
       [0., 0., 1., 0., 1.]])
hpaulj
  • 221,503
  • 14
  • 230
  • 353
0

Building on the accepted answer: for completeness, here is a more generalized solution.

def row_by_row_concatenation_of_two_arrays(arr0, arr1):
    """Combine two arrays using row by row concatenation. 
    
    Example
    =======

    arr0:           arr1:
    [[1,2,3],       [[1,2],
    [4,5,6],        [3,4]]
    [7,8,9]]

    Into form of:
    
    [[1,2,3,1,2],
    [1,2,3,3,4],
    [4,5,6,1,2],
    [4,5,6,3,4],
    [7,8,9,1,2],
    [7,8,9,3,4]]
    """

    arr0_repeated = np.repeat(arr0, repeats=arr1.shape[0], axis=0)
    arr1_tiled = np.tile(arr1, (arr0.shape[0], 1))
    return np.hstack((arr0_repeated, arr1_tiled))

Use that function with functools.reduce:

In [11]: from module import row_by_row_concatenation_of_two_arrays
In [12]: import functools
In [13]: x=((1.2,1.6),(0.5,0.5,0.5), (1,2))

In [14]: diags=[np.diag(group) for group in x]

In [15]: diags
Out[15]:
[array([[1.2, 0. ],
        [0. , 1.6]]),
 array([[0.5, 0. , 0. ],
        [0. , 0.5, 0. ],
        [0. , 0. , 0.5]]),
 array([[1, 0],
        [0, 2]])]
In [45]: functools.reduce(row_by_row_concatenation_of_two_arrays, diags)
Out[45]:
array([[1.2, 0. , 0.5, 0. , 0. , 1. , 0. ],
       [1.2, 0. , 0.5, 0. , 0. , 0. , 2. ],
       [1.2, 0. , 0. , 0.5, 0. , 1. , 0. ],
       [1.2, 0. , 0. , 0.5, 0. , 0. , 2. ],
       [1.2, 0. , 0. , 0. , 0.5, 1. , 0. ],
       [1.2, 0. , 0. , 0. , 0.5, 0. , 2. ],
       [0. , 1.6, 0.5, 0. , 0. , 1. , 0. ],
       [0. , 1.6, 0.5, 0. , 0. , 0. , 2. ],
       [0. , 1.6, 0. , 0.5, 0. , 1. , 0. ],
       [0. , 1.6, 0. , 0.5, 0. , 0. , 2. ],
       [0. , 1.6, 0. , 0. , 0.5, 1. , 0. ],
       [0. , 1.6, 0. , 0. , 0.5, 0. , 2. ]])

I am guessing there is an even faster way to do this for the generalized case, but this suits my purposes.

Rick
  • 43,029
  • 15
  • 76
  • 119