How to split numpy array keeping a few elements from previous split?

Question

I have a numpy array which I wish to split across a certain dimension. While splitting the array, I need to prepend (to the beginning of each element) a trailing part of the previous element. For instance,

Let my array be [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]. Let my split_size = 2 and pad_length = 1. split_size will always be a divisor of array length. My resultant splits would look like,

[random, 0, 1], [1, 2, 3], [3, 4, 5], [5, 6, 7], [7, 8, 9]. My splits were all prepended by the last value of the previous element.

Needless to say, my arrays are multidimensional and I need an efficent vectorized way to do this along a certain dimension.

Here, I can provide the value of random.

Won't we need padding on the trailing side too, like for the given input with : `split_size = 5, pad_length = 2`? So , I am guessing the last row would be : `[7 8 9 random random]`. — Divakar, Dec 01 '16 at 09:42
Why? for those parameters, I should get this --> `[random, random, 0, 1, 2, 3, 4], [3, 4, 5, 6, 7, 8, 9]`. If the question is not clear, I'll be happy to improve it as you direct! — martianwars, Dec 01 '16 at 09:46
Ah I got the params wrong. I meant if `split_size = 3, pad_length = 2`? — Divakar, Dec 01 '16 at 09:47
Oh, in this case `split_size` is always a divisor of array length — martianwars, Dec 01 '16 at 09:48

Oliver W. · Accepted Answer · 2016-12-01T09:24:58.367

Sounds like a job for as_strided.

as_strided returns a memory efficient view on an array and can be used for retrieving a moving window over an array. The numpy documentation on it is scarce, but there's a number of decent blog posts, online slide decks, and SO issues that you can find that explain it in more detail.

>>> import numpy as np
>>> from numpy.lib.stride_tricks import as_strided
>>> a = np.arange(10)
>>> split_size = 2
>>> pad_length = 1
>>> random = -9
>>> # prepend the desired constant value
>>> b = np.pad(a, (pad_length, 0), mode='constant', constant_values=random)
>>> # return a memory efficient view on the array
>>> as_strided(b,
...     shape=(b.size//split_size, split_size + pad_length),
...     strides=(b.strides[0]*split_size, b.strides[0]))
...
array([[-9,  0,  1],
       [ 1,  2,  3],
       [ 3,  4,  5],
       [ 5,  6,  7],
       [ 7,  8,  9]])

Be aware that if the new strides go out of bounds, you'll see the memory contents of adjacent memory appearing at the end of the array.

I think you meant `split_size + pad_length`. Great answer! :D — martianwars, Dec 01 '16 at 09:21

Divakar · Answer 2 · 2016-12-01T10:51:33.687

Listed here is another approach with strides and could be looked at as a cheat stuff, as we would stride backwards from the beginning of the input array beyond the memory allocated for it to have a padded version implicitly and actually assigning values into the to-be-padded region at the end.

Here's how it would look like -

def padded_sliding_windows(a, split_size, pad_length, padnum):
    n = a.strides[0]
    L = split_size + pad_length
    S = L - pad_length
    nrows = ((a.size + pad_length -L)//split_size)+1
    strided = np.lib.stride_tricks.as_strided
    out = strided(a[split_size - 1:], shape=(nrows,L), strides=(S*n,-n))[:,::-1]
    out[0,:pad_length] = padnum
    return out

Few sample runs -

In [271]: a = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])

In [272]: padded_sliding_windows(a, split_size = 2, pad_length = 1, padnum = 100)
Out[272]: 
array([[100,   0,   1],
       [  1,   2,   3],
       [  3,   4,   5],
       [  5,   6,   7],
       [  7,   8,   9],
       [  9,  10,  11]])

In [273]: padded_sliding_windows(a, split_size = 3, pad_length = 2, padnum = 100)
Out[273]: 
array([[100, 100,   0,   1,   2],
       [  1,   2,   3,   4,   5],
       [  4,   5,   6,   7,   8],
       [  7,   8,   9,  10,  11]])

In [274]: padded_sliding_windows(a, split_size = 4, pad_length = 2, padnum = 100)
Out[274]: 
array([[100, 100,   0,   1,   2,   3],
       [  2,   3,   4,   5,   6,   7],
       [  6,   7,   8,   9,  10,  11]])

score 0 · Answer 3 · answered Dec 01 '16 at 08:50

0

The following comes close:

arr = np.array([0,1,2,3,4,5,6,7,8,9])
[arr[max(0, idx-1):idx+2] for idx in range(0, len(arr), 2)]

Only difference is that the first one does not have a leading random, as you put it.

answered Dec 01 '16 at 08:50

acdr

4,538
2
19
45

Would this be efficient for larger arrays? – martianwars Dec 01 '16 at 08:58
1

Probably it wouldn't be too bad, considering it's just slicing, which produces views of the data, rather than copies. Just make sure that you add `:`s for the additional dimensions. – acdr Dec 01 '16 at 09:14

How to split numpy array keeping a few elements from previous split?

3 Answers3