How do I combine two numpy arrays so for each row of the first array I append all rows from the second one?

Question

I have the following numpy arrays:

theta_array =
array([[ 1, 10],
       [ 1, 11],
       [ 1, 12],
       [ 1, 13],
       [ 1, 14],
       [ 2, 10],
       [ 2, 11],
       [ 2, 12],
       [ 2, 13],
       [ 2, 14],
       [ 3, 10],
       [ 3, 11],
       [ 3, 12],
       [ 3, 13],
       [ 3, 14],
       [ 4, 10],
       [ 4, 11],
       [ 4, 12],
       [ 4, 13],
       [ 4, 14]])

XY_array  = 
array([[ 44.0394952 , 505.81099922],
       [ 61.03882938, 515.97253226],
       [ 26.69851841, 525.18083012],
       [ 46.78487831, 533.42309602],
       [ 45.77188401, 545.42988355],
       [ 81.12969132, 554.78767379],
       [ 54.178463  , 565.8716283 ],
       [ 41.58952084, 574.76827133],
       [ 85.24956815, 585.1355127 ],
       [ 80.73726733, 595.49446033],
       [ 22.70625059, 605.59017175],
       [ 40.66810604, 615.26308629],
       [ 47.16694695, 624.39222332],
       [ 48.72499541, 633.19846364],
       [ 50.68589921, 643.72334885],
       [ 38.42731134, 654.68595883],
       [ 47.39519707, 666.28232866],
       [ 58.07767155, 673.9572227 ],
       [ 72.11393347, 683.68307373],
       [ 53.70872932, 694.65509894],
       [ 82.08237952, 704.5868817 ],
       [ 46.64069738, 715.18427515],
       [ 40.46032478, 723.91308011],
       [ 75.69090892, 733.69595658],
       [120.61447884, 745.31322786],
       [ 60.17764744, 754.89747186],
       [ 87.15961973, 766.24040447],
       [ 82.93872713, 773.01518252],
       [ 93.56688906, 785.60640153],
       [ 70.0474047 , 793.81792947],
       [104.3613818 , 805.40234676],
       [108.39253837, 814.75002114],
       [ 78.97643673, 824.95386427],
       [ 85.69096895, 834.44797862],
       [ 53.07112931, 844.39555058],
       [111.49875807, 855.660508  ],
       [ 70.88824958, 865.53417489],
       [ 79.55499469, 875.31303945],
       [ 60.86941464, 885.85235946],
       [101.06017712, 896.69986636],
       [ 74.55823544, 905.87417231],
       [113.24705653, 915.19350121],
       [ 94.21920882, 925.87933273],
       [ 63.26478103, 933.70804578],
       [ 95.97827181, 945.76196917],
       [ 80.48623318, 955.60422694],
       [ 80.03451808, 964.39856485],
       [ 73.86032436, 973.91032818],
       [103.96923524, 984.24366761],
       [ 93.20663129, 995.44618851]])

I am trying to combine both, so for each combination of theta_array I get all combinations from XY_array.

I am aware about this post so I have done this:

np.array(np.meshgrid(theta_array, XY_array)).T.reshape(-1,4)

But this generates:

array([[  1.        ,  44.0394952 ,   1.        , 505.81099922],
       [  1.        ,  61.03882938,   1.        , 515.97253226],
       [  1.        ,  26.69851841,   1.        , 525.18083012],
       ...,
       [ 14.        ,  73.86032436,  14.        , 973.91032818],
       [ 14.        , 103.96923524,  14.        , 984.24366761],
       [ 14.        ,  93.20663129,  14.        , 995.44618851]])

and the problem requires:

array([[  1.        ,   1.          ,  44.0394952 , 505.81099922],
       [  1.        ,   1.          ,  61.03882938, 515.97253226],
       [  1.        ,   1.          ,  26.69851841, 525.18083012],
       ...,
       [ 14.        ,   14.        ,  73.86032436,   973.91032818],
       [ 14.        ,   14.        , 103.96923524,   984.24366761],
       [ 14.        ,   14.        ,  93.20663129,   995.44618851]])

Which would be the way of doing this combination/aggregation in numpy?

EDIT:

There is a mistake in the above process as the combined arrays do not lead to the generation of that matrix. With separate vectors for each column the actual solution to merge this is:

dataset = np.array(np.meshgrid(theta0_range, theta1_range, X)).T.reshape(-1,3)

And later the Y vector can be added as an additional column.

David Erickson · Accepted Answer · 2020-10-24T03:34:09.943

You can reorder the "columns" after using meshgrid with [:,[0,2,1,3]] and if you need to make the list dynamic because of a large number of columns, then you can see the end of my answer:

np.array(np.meshgrid(theta_array, XY_array)).T.reshape(-1,4)[:,[0,2,1,3]]

Output:

array([[  1.        ,   1.        ,  44.0394952 , 505.81099922]],
       [[  1.        ,   1.        ,  61.03882938, 515.97253226]],
       [[  1.        ,   1.        ,  26.69851841, 525.18083012]],
       ...,
       [[ 14.        ,  14.        ,  73.86032436, 973.91032818]],
       [[ 14.        ,  14.        , 103.96923524, 984.24366761]],
       [[ 14.        ,  14.        ,  93.20663129, 995.44618851]])

If you have many columns you could dynamically create this list: [0,2,1,3] with list comprehension. For example:

n = new_arr.shape[1]*2
lst = [x for x in range(n) if x % 2 == 0]
[lst.append(z) for z in [y for y in range(n) if y % 2 == 1]]
lst

[0, 2, 4, 6, 1, 3, 5, 7]

Then, you could rewrite to:

np.array(np.meshgrid(theta_array, XY_array)).T.reshape(-1,4)[:,lst]

Thanks for sharing, this helps. The solution to reorder is embarrasingly easy but it still not obvious when you start using numpy -still rewiring my brain to operate with matrices operations instead of writing 'for' loops-. Knowing about list comprehension for other cases also help. — M.E., Oct 24 '20 at 09:19

score 1 · Answer 2 · answered Oct 24 '20 at 02:28

You can use itertools.product:

out = np.array([*product(theta_array, XY_array)])
out = out.reshape(out.shape[0],-1)

Output:

array([[  1.        ,  10.        ,  44.0394952 , 505.81099922],
       [  1.        ,  10.        ,  61.03882938, 515.97253226],
       [  1.        ,  10.        ,  26.69851841, 525.18083012],
       ...,
       [  4.        ,  14.        ,  73.86032436, 973.91032818],
       [  4.        ,  14.        , 103.96923524, 984.24366761],
       [  4.        ,  14.        ,  93.20663129, 995.44618851]])

That said, this looks very much like an XY-problem. What are you trying to do with this array?

Thanks I did not know the itertools package. Sounds interesting. — M.E., Oct 24 '20 at 09:31

M.E. · Answer 3 · 2020-10-24T09:59:18.650

Just as side/complementary reference here is a comparison in terms of execution time for both solutions. For this specific operation itertools takes 10 times more time to complete than its numpy equivalent.

%%time

for i in range(1000):    
    z = np.array(np.meshgrid(theta_array, XY_array)).T.reshape(-1,4)[:,[0,2,1,3]]

CPU times: user 299 ms, sys: 0 ns, total: 299 ms
Wall time: 328 ms

%%time

for i in range(1000):    
    z = np.array([*product(theta_array, XY_array)])    
    z = z.reshape(z.shape[0],-1)

CPU times: user 2.79 s, sys: 474 µs, total: 2.79 s
Wall time: 2.84 s

How do I combine two numpy arrays so for each row of the first array I append all rows from the second one?

3 Answers3