Merging arrays of varying size in Python

Question

is there an easy way to merge let's say n spectra (i.e. arrays of shape (y_n, 2)) with varying lengths y_n into an array (or list) of shape (y_n_max, 2*x) by filling up y_n with zeros if it is

Basically I want to have all spectra next to each other. For example

a = [[1,2],[2,3],[4,5]]
b = [[6,7],[8,9]]

into

c = [[1,2,6,7],[2,3,8,9],[4,5,0,0]]

Either Array or List would be fine. I guess it comes down to filling up arrays with zeros?

What's `n` compared `y_n`? a few long arrays? or many short? — hpaulj, Mar 10 '17 at 17:21
There is an method of turning an list of arrays into a 2d array with padding, but it's sufficiently convoluted that I'd have to look it up. '@Divakar` is our resident guru for that sort of thing. But for your sizes the `zip_longest` solution should be fast enough and easily remembered. — hpaulj, Mar 10 '17 at 17:46

score 4 · Accepted Answer · answered Mar 10 '17 at 17:01

4

If you're dealing with native Python lists, then you can do:

from itertools import zip_longest

c = [a + b for a, b in zip_longest(a, b, fillvalue=[0, 0])]

answered Mar 10 '17 at 17:01

Jon Clements

138,671
33
247
280

But how do you generalize this to `n` lists? – hpaulj Mar 10 '17 at 18:18
@hpaulj you can use [Matthew's solution](http://stackoverflow.com/a/42723977/1252759) – Jon Clements Mar 10 '17 at 18:20
@hpaulj yes... the shortest element(s) will be padded with [0,0] to meet the longest argument no matter the ordering - see the help for zip_longest or have a play with the code with some mocked data to try it yourself if you want. – Jon Clements Mar 10 '17 at 18:39
For a bit I thought the padding was in the wrong place, but it became clearer once I cast it as an array and saw columns line up correctly. – hpaulj Mar 10 '17 at 19:32

score 2 · Answer 2 · answered Mar 10 '17 at 17:08

You also could do this with extend and zip without itertools provided a will always be longer than b. If b could be longer than a, the you could add a bit of logic as well.

a = [[1,2],[2,3],[4,5]]
b = [[6,7],[8,9]]

b.extend([[0,0]]*(len(a)-len(b)))
[[x,y] for x,y in zip(a,b)]

score 2 · Answer 3 · edited May 23 '17 at 11:46

Trying to generalize the other solutions to multiple lists:

In [114]: a
Out[114]: [[1, 2], [2, 3], [4, 5]]
In [115]: b
Out[115]: [[6, 7], [8, 9]]
In [116]: c
Out[116]: [[3, 4]]
In [117]: d
Out[117]: [[1, 2], [2, 3], [4, 5], [6, 7], [8, 9]]
In [118]: ll=[a,d,c,b]

zip_longest pads

In [120]: [l for l in itertools.zip_longest(*ll,fillvalue=[0,0])]
Out[120]: 
[([1, 2], [1, 2], [3, 4], [6, 7]),
 ([2, 3], [2, 3], [0, 0], [8, 9]),
 ([4, 5], [4, 5], [0, 0], [0, 0]),
 ([0, 0], [6, 7], [0, 0], [0, 0]),
 ([0, 0], [8, 9], [0, 0], [0, 0])]

intertools.chain flattens the inner lists (or .from_iterable(l))

In [121]: [list(itertools.chain(*l)) for l in _]
Out[121]: 
[[1, 2, 1, 2, 3, 4, 6, 7],
 [2, 3, 2, 3, 0, 0, 8, 9],
 [4, 5, 4, 5, 0, 0, 0, 0],
 [0, 0, 6, 7, 0, 0, 0, 0],
 [0, 0, 8, 9, 0, 0, 0, 0]]

More ideas at Convert Python sequence to NumPy array, filling missing values

Adapting @Divakar's solution to this case:

def divakars_pad(ll):
    lens = np.array([len(item) for item in ll])
    mask = lens[:,None] > np.arange(lens.max())
    out = np.zeros((mask.shape+(2,)), int)
    out[mask,:] = np.concatenate(ll)
    out = out.transpose(1,0,2).reshape(5,-1)
    return out

In [142]: divakars_pad(ll)
Out[142]: 
array([[1, 2, 1, 2, 3, 4, 6, 7],
       [2, 3, 2, 3, 0, 0, 8, 9],
       [4, 5, 4, 5, 0, 0, 0, 0],
       [0, 0, 6, 7, 0, 0, 0, 0],
       [0, 0, 8, 9, 0, 0, 0, 0]])

For this small size the itertools solution is faster, even with an added conversion to array.

With an array as target we don't need the chain flattener; reshape takes care of that:

In [157]: np.array(list(itertools.zip_longest(*ll,fillvalue=[0,0]))).reshape(-1, len(ll)*2)
Out[157]: 
array([[1, 2, 1, 2, 3, 4, 6, 7],
       [2, 3, 2, 3, 0, 0, 8, 9],
       [4, 5, 4, 5, 0, 0, 0, 0],
       [0, 0, 6, 7, 0, 0, 0, 0],
       [0, 0, 8, 9, 0, 0, 0, 0]])

score 1 · Answer 4 · answered Mar 10 '17 at 17:07

Use the zip built-in function and the chain.from_iterable function from itertools. This has the benefit of being more type agnostic than the other posted solution -- it only requires that your spectra are iterables.

a = [[1,2],[2,3],[4,5]]
b = [[6,7],[8,9]]

c = list(list(chain.from_iterable(zs)) for zs in zip(a,b))

If you want more than 2 spectra, you can change the zip call to zip(a,b,...)

Merging arrays of varying size in Python

4 Answers4