zip_longest for numpy arrays

Question

I'm trying to combine different length numpy arrays as one could equivalently do using lists with itertools.zip_longest. Say I have:

a = np.array([1, 5, 9, 13])
b = np.array([2, 6])

With itertools one could interleave these two arrays using chain and zip_longest, and fill the missing values with say 0:

from itertools import chain, zip_longest
list(chain(*zip_longest(*[a, b], fillvalue=0)))
# [1, 2, 5, 6, 9, 0, 13, 0]

Is there a simple way to do this using numpy that I'm missing?

[You mean this?](https://stackoverflow.com/a/5347492/8204776) — meowgoesthedog, Apr 29 '19 at 10:05
As the example is put, I think I'd first allocate the output array with zeros and assign the strided slices, but I don't know if that would work in your real use case. — jdehesa, Apr 29 '19 at 10:06
@meowgoesthedog: If you remove 6 from the `b` array from the linked answer, you are missing a 0 which is what OP wants as the fillvalue — Sheldore, Apr 29 '19 at 10:06
I'm looking preferably for a general solution, applicable to different length arrays instead of using strided slices from the array @jdehesa. Perhaps I should make it more general — yatu, Apr 29 '19 at 10:08

Divakar · Accepted Answer · 2019-04-29T11:28:30.380

Here's an almost vectorized one -

# https://stackoverflow.com/a/38619350/3293881 @Divakar
def boolean_indexing(v):
    lens = np.array([len(item) for item in v])
    mask = lens[:,None] > np.arange(lens.max())
    out_dtype = np.result_type(*[arr.dtype for arr in v])
    out = np.zeros(mask.shape,dtype=out_dtype)
    out[mask] = np.concatenate(v)
    return out

v = [a,b] # list of all input arrays
out = boolean_indexing(v).ravel('F')

Sample run -

In [23]: a = np.array([1, 5, 9, 13])
    ...: b = np.array([2, 6])
    ...: c = np.array([7, 8, 10])
    ...: v = [a,b,c]

In [24]: boolean_indexing(v).ravel('F')
Out[24]: array([ 1,  2,  7,  5,  6,  8,  9,  0, 10, 13,  0,  0])

Ahh this is great!! Nice use of broadcasting here, and thanks for sharing an easier way to interleave the result array using `.ravel('F')`! — yatu, Apr 29 '19 at 11:30

jdehesa · Answer 2 · 2019-04-29T10:20:55.013

1

I think I'd do that like this:

import numpy as np

def chain_zip_longest(*arrs, fillvalue=0, dtype=None):
    arrs = [np.asarray(arr) for arr in arrs]
    if not arrs:
        return np.array([])
    n = len(arrs)
    dtype = dtype or np.find_common_type([arr.dtype for arr in arrs], [])
    out = np.full(n * max(len(arr) for arr in arrs), fillvalue, dtype=dtype)
    for i, arr in enumerate(arrs):
        out[i:i + n * len(arr):len(arrs)] = arr
    return out

print(chain_zip_longest([1, 2], [3, 4, 5], [6]))
# [1 3 6 2 4 0 0 5 0]

edited Apr 29 '19 at 10:20

answered Apr 29 '19 at 10:14

jdehesa

58,456
7
77
121

Nice solution @jdehesa. Seems like there's no simple way around this and assigning in a for loop like here with `enumerate` is the way to go. Doesn't seem like there's any stacking method such as `np.column_stack` with some other further concatenation that supports this functionality – yatu Apr 29 '19 at 10:26
@yatu Yes, it's always complicated with variable-length arrays. I suppose you could also do this through a sparse matrix (each given array as a column and then flatten it), but I'm not sure if more efficiently (without copies or no looping)... – jdehesa Apr 29 '19 at 10:37

zip_longest for numpy arrays

2 Answers2