1

Let's say I have two 1D arrays A, B

A=[1,2,3,4]
B=[5,6,7,8,9]

And I want to get

C=[[1,5],[1,6],[1,7],[1,8],[1,9],
   [2,5],[2,6],[2,7],[2,8],[2,9],
   [3,5],[3,6],[3,7],[3,8],[3,9],
   [4,5],[4,6],[4,7],[4,8],[4,9]]

I tried this by creating a new array C and using a loop to insert the arrays

C=np.empty(shape=(A.shape[0]*B.shape[0],2))
for i in range(A.shape[0]):
    C[i*B.shape[0]:(i+1)*B.shape[0],0]=A[i]
for i in range(B.shape[0]):
    C[i*A.shape[0]:(i+1)*A.shape[0],1]=B[i]

However, I have about 50,000 cases of |A|*|B|=100*100 to compute. Is there any other way (numpy-thonic or pythonic) I can improve time complexity?

Jee Seok Yoon
  • 4,716
  • 9
  • 32
  • 47
  • 1
    Similar question: [Numpy: cartesian product of x and y array points into single array of 2D points](http://stackoverflow.com/questions/11144513/numpy-cartesian-product-of-x-and-y-array-points-into-single-array-of-2d-points) – user2314737 Apr 05 '17 at 15:30

3 Answers3

2

Approach #1

Use np.meshgrid and then np.array and a final transpose -

np.array(np.meshgrid(A,B)).T.reshape(-1,2)

Sample run -

In [3]: A
Out[3]: array([1, 2, 3, 4])

In [4]: B
Out[4]: array([5, 6, 7, 8, 9])

In [5]: np.array(np.meshgrid(A,B)).T.reshape(-1,2)
Out[5]: 
array([[1, 5],
       [1, 6],
       [1, 7],
       [1, 8],
       [1, 9],
       [2, 5],
       [2, 6],
       [2, 7],
       [2, 8],
       [2, 9],
       [3, 5],
       [3, 6],
       [3, 7],
       [3, 8],
       [3, 9],
       [4, 5],
       [4, 6],
       [4, 7],
       [4, 8],
       [4, 9]])

Approach #2

Initialization based approach with focus on performance, especially for large arrays -

def initialization_based(A,B):
    N = A.size
    M = B.size
    out = np.empty((N,M,2),dtype=A.dtype)
    out[...,0] = A[:,None]
    out[...,1] = B
    out.shape = (-1,2)
    return out

Runtime test

In [7]: A = np.random.randint(0,9,(100))

In [8]: B = np.random.randint(0,9,(100))

In [9]: %timeit np.array(np.meshgrid(A,B)).T.reshape(-1,2)
10000 loops, best of 3: 69.1 µs per loop

In [10]: %timeit initialization_based(A,B)
100000 loops, best of 3: 11.1 µs per loop

Including approaches from other posts with the same setup -

In [183]: from itertools import product

# @Chris Mueller's soln
In [184]: %timeit [x for x in product(A,B)]
1000 loops, best of 3: 503 µs per loop

# @jyotish's soln
In [185]: %timeit [[i, j] for i in A for j in B]
1000 loops, best of 3: 1.34 ms per loop
Divakar
  • 218,885
  • 19
  • 262
  • 358
0

You can use itertools.product which returns an iterator.

from itertools import product 
[x for x in product([1, 2, 3, 4], [5, 6, 7, 8, 9])]
[(1, 5),
 (1, 6),
 (1, 7),
 (1, 8),
 (1, 9),
 (2, 5),
 (2, 6),
 (2, 7),
 (2, 8),
 (2, 9),
 (3, 5),
 (3, 6),
 (3, 7),
 (3, 8),
 (3, 9),
 (4, 5),
 (4, 6),
 (4, 7),
 (4, 8),
 (4, 9)]
Chris Mueller
  • 6,490
  • 5
  • 29
  • 35
0

Pythonic way would be to use list comprehension:

Given,
A=[1,2,3,4]
B=[5,6,7,8,9]

C = [[i, j] for i in A for j in B]

I strongly suggest to use timeit to test the running time

JkShaw
  • 1,927
  • 2
  • 13
  • 14