Interleaving NumPy arrays with mismatching shapes

Question

I would like to interleave multiple numpy arrays with differing dimensions along a particular axis. In particular, I have a list of arrays of shape (_, *dims), varying along the first axis, which I would like to interleave to obtain another array of shape (_, *dims). For instance, given the input

a1 = np.array([[11,12], [41,42]])
a2 = np.array([[21,22], [51,52], [71,72], [91,92], [101,102]])
a3 = np.array([[31,32], [61,62], [81,82]])

interweave(a1,a2,a3)

the desired output would be

np.array([[11,12], [21,22], [31,32], [41,42], [51,52], [61,62], [71,72], [81,82], [91,92], [101,102]]

With the help of previous posts (such as Numpy concatenate arrays with interleaving), I've gotten this working when the arrays match along the first dimension:

import numpy as np

def interweave(*arrays, stack_axis=0, weave_axis=1):
    final_shape = list(arrays[0].shape)
    final_shape[stack_axis] = -1

    # stack up arrays along the "weave axis", then reshape back to desired shape
    return np.concatenate(arrays, axis=weave_axis).reshape(final_shape)

Unfortunately, if the input shapes mismatch along the first dimension, the above throws an exception since we must concatenate along a different axis than the mismatching one. Indeed, I don't see any way to use concatenation effectively here, since concatenating along the mismatched axis will destroy information we need to produce the desired output.

One other idea I had was to pad the input arrays with null entries until their shapes match along the first dimension, and then remove the null entries at the end of the day. While this would work, I am not sure how best to implement it, and it seems like it should not be necessary in the first place.

yatu · Accepted Answer · 2019-06-05T15:14:31.773

Here's a mostly NumPy based approach using also zip_longest to interleave the arrays with a fill value:

def interleave(*a):
    # zip_longest filling values with as many NaNs as
    # values in second axis
    l = *zip_longest(*a, fillvalue=[np.nan]*a[0].shape[1]),
    # build a 2d array from the list
    out = np.concatenate(l)
    # return non-NaN values
    return out[~np.isnan(out[:,0])]

a1 = np.array([[11,12], [41,42]])
a2 = np.array([[21,22], [51,52], [71,72], [91,92], [101,102]])
a3 = np.array([[31,32], [61,62], [81,82]])

interleave(a1,a2,a3)

array([[ 11.,  12.],
       [ 21.,  22.],
       [ 31.,  32.],
       [ 41.,  42.],
       [ 51.,  52.],
       [ 61.,  62.],
       [ 71.,  72.],
       [ 81.,  82.],
       [ 91.,  92.],
       [101., 102.]])

Mad Physicist · Answer 2 · 2019-06-05T12:55:44.297

2

You are likely looking for np.choose. With a properly constructed index, you can make the result in one call:

def interweave(*arrays, axis=0):
    arrays = [np.moveaxis(a, axis, 0) for a in arrays]
    m = len(arrays)
    n = max(map(len, arrays))
    index = [k for i, k in (divmod(x, m) for x in range(m * n)) if i < len(arrays[k])]
    return np.moveaxis(np.choose(index, arrays), 0, axis)

range(m * n) is the size of the output space if all the arrays were the same size. divmod computes the element of the interleaving and the array it is being selected from. Elements that are missing because the array is too short are skipped, so the result only selects valid elements from the arrays.

There are probably better ways of making the index, but this works as an example. You have to move the stack axis to the first position since choose takes along the first axis.

edited Jun 05 '19 at 12:55

answered Jun 05 '19 at 12:31

Mad Physicist

107,652
25
181
264

It seems to me that `np.choose` does not behave quite as intended here. For a list [a_i] of indices, it will pick the ith row of the (a_i)th array, but what we want is for it to pick the _next unpicked row_ from the (a_i)th array. – user78729 Jun 06 '19 at 00:49
Yeah, it really doesn't. I'm trying to figure out a way of increasing the dimensions artificially with stride tricks. Almost there. – Mad Physicist Jun 06 '19 at 03:59

user78729 · Answer 3 · 2019-06-06T00:20:12.297

I went ahead and generalized yatu's answer to the situation I'm facing in practice, where the number of dimensions is arbitrary. Here is what I have:

import numpy as np
from itertools import zip_longest

def interleave(*a):
    #creating padding array of NaNs
    fill_shape = a[0].shape[1:]
    fill_array = np.full(fill_shape,np.nan)

    l = *zip_longest(*a, fillvalue=fill_array),
    # build a 2d array from the list
    out = np.concatenate(l)
    # return non-NaN values
    tup = (0,)*(len(out.shape)-1)
    return out[~np.isnan(out[(...,)+tup])]

Testing this out:

b1 = np.array(
        [
                [[111,112,113],[121,122,123]],
                [[411,412,413],[421,422,423]]
        ])
b2=np.array(
        [
                [[211,212,213],[221,222,223]],
                [[511,512,513],[521,522,523]],
                [[711,712,713],[721,722,712]],
                [[911,912,913],[921,922,923]],
                [[1011,1012,1013],[1021,1022,1023]]
        ])
b3=np.array(
        [
                [[311,312,313],[321,322,323]],
                [[611,612,613],[621,622,623]],
                [[811,812,813],[821,822,823]]
        ])

In [1]: interleave(b1,b2,b3)
Out [1]: [[[ 111.  112.  113.]
  [ 121.  122.  123.]]

 [[ 211.  212.  213.]
  [ 221.  222.  223.]]

 [[ 311.  312.  313.]
  [ 321.  322.  323.]]

 [[ 411.  412.  413.]
  [ 421.  422.  423.]]

 [[ 511.  512.  513.]
  [ 521.  522.  523.]]

 [[ 611.  612.  613.]
  [ 621.  622.  623.]]

 [[ 711.  712.  713.]
  [ 721.  722.  712.]]

 [[ 811.  812.  813.]
  [ 821.  822.  823.]]

 [[ 911.  912.  913.]
  [ 921.  922.  923.]]

 [[1011. 1012. 1013.]
  [1021. 1022. 1023.]]]

Any suggestions are welcome! In particular, in my application, space, not time, is the limiting factor, so I'm wondering if there is a way to do this using significantly less memory (the datasets are large along the merging axis).

Interleaving NumPy arrays with mismatching shapes

3 Answers3