Efficient Numpy 2D array construction from 1D array

Question

I have an array like this:

A = array([1,2,3,4,5,6,7,8,9,10])

And I am trying to get an array like this:

B = array([[1,2,3],
          [2,3,4],
          [3,4,5],
          [4,5,6]])

Where each row (of a fixed arbitrary width) is shifted by one. The array of A is 10k records long and I'm trying to find an efficient way of doing this in Numpy. Currently I am using vstack and a for loop which is slow. Is there a faster way?

Edit:

width = 3 # fixed arbitrary width
length = 10000 # length of A which I wish to use
B = A[0:length + 1]
for i in range (1, length):
    B = np.vstack((B, A[i, i + width + 1]))

@wxbx: Please elaborate more what you are aiming to do? Please note that `B = array([1,2,3],[2,3,4],[3,4,5],[4,5,6])` is not valid `numpy`! — eat, Feb 07 '11 at 16:30
@wxbx - your solution is really unlucky. You `vstack` the array 10000 times! See my answer, I `vstack` it just once. — eumiro, Feb 07 '11 at 16:37
Is `A` really equal to a sequence of increasing numbers or is it just for illustrating the positions? If the former, I know I quick way to do it. :) — Christian Alis, Feb 07 '11 at 16:43
@wxbx: Your `B= array([1, ...` is still not valid `numpy`. But you are really fast to accept an answer ;-). Thanks — eat, Feb 07 '11 at 16:44
@eat - fixed! eumiro posted an answer I plugged it in and I got instant speedup so I accepted, didnt see yours when I accepted but I'll have a play later. Thanks :) — wxbx, Feb 07 '11 at 16:56
@wxbx: I wasn't actually pointing out my own answer then. See now, there is a variety of good answers. Thanks — eat, Feb 08 '11 at 01:56

score 61 · Accepted Answer · edited Dec 23 '13 at 20:06

Actually, there's an even more efficient way to do this... The downside to using vstack etc, is that you're making a copy of the array.

Incidentally, this is effectively identical to @Paul's answer, but I'm posting this just to explain things in a bit more detail...

There's a way to do this with just views so that no memory is duplicated.

I'm directly borrowing this from Erik Rigtorp's post to numpy-discussion, who in turn, borrowed it from Keith Goodman's Bottleneck (Which is quite useful!).

The basic trick is to directly manipulate the strides of the array (For one-dimensional arrays):

import numpy as np

def rolling(a, window):
    shape = (a.size - window + 1, window)
    strides = (a.itemsize, a.itemsize)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

a = np.arange(10)
print rolling(a, 3)

Where a is your input array and window is the length of the window that you want (3, in your case).

This yields:

[[0 1 2]
 [1 2 3]
 [2 3 4]
 [3 4 5]
 [4 5 6]
 [5 6 7]
 [6 7 8]
 [7 8 9]]

However, there is absolutely no duplication of memory between the original a and the returned array. This means that it's fast and scales much better than other options.

For example (using a = np.arange(100000) and window=3):

%timeit np.vstack([a[i:i-window] for i in xrange(window)]).T
1000 loops, best of 3: 256 us per loop

%timeit rolling(a, window)
100000 loops, best of 3: 12 us per loop

If we generalize this to a "rolling window" along the last axis for an N-dimensional array, we get Erik Rigtorp's "rolling window" function:

import numpy as np

def rolling_window(a, window):
   """
   Make an ndarray with a rolling window of the last dimension

   Parameters
   ----------
   a : array_like
       Array to add rolling window to
   window : int
       Size of rolling window

   Returns
   -------
   Array that is a view of the original array with a added dimension
   of size w.

   Examples
   --------
   >>> x=np.arange(10).reshape((2,5))
   >>> rolling_window(x, 3)
   array([[[0, 1, 2], [1, 2, 3], [2, 3, 4]],
          [[5, 6, 7], [6, 7, 8], [7, 8, 9]]])

   Calculate rolling mean of last dimension:
   >>> np.mean(rolling_window(x, 3), -1)
   array([[ 1.,  2.,  3.],
          [ 6.,  7.,  8.]])

   """
   if window < 1:
       raise ValueError, "`window` must be at least 1."
   if window > a.shape[-1]:
       raise ValueError, "`window` is too long."
   shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
   strides = a.strides + (a.strides[-1],)
   return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

So, let's look into what's going on here... Manipulating an array's strides may seem a bit magical, but once you understand what's going on, it's not at all. The strides of a numpy array describe the size in bytes of the steps that must be taken to increment one value along a given axis. So, in the case of a 1-dimensional array of 64-bit floats, the length of each item is 8 bytes, and x.strides is (8,).

x = np.arange(9)
print x.strides

Now, if we reshape this into a 2D, 3x3 array, the strides will be (3 * 8, 8), as we would have to jump 24 bytes to increment one step along the first axis, and 8 bytes to increment one step along the second axis.

y = x.reshape(3,3)
print y.strides

Similarly a transpose is the same as just reversing the strides of an array:

print y
y.strides = y.strides[::-1]
print y

Clearly, the strides of an array and the shape of an array are intimately linked. If we change one, we have to change the other accordingly, otherwise we won't have a valid description of the memory buffer that actually holds the values of the array.

Therefore, if you want to change both the shape and size of an array simultaneously, you can't do it just by setting x.strides and x.shape, even if the new strides and shape are compatible.

That's where numpy.lib.as_strided comes in. It's actually a very simple function that just sets the strides and shape of an array simultaneously.

It checks that the two are compatible, but not that the old strides and new shape are compatible, as would happen if you set the two independently. (It actually does this through numpy's __array_interface__, which allows arbitrary classes to describe a memory buffer as a numpy array.)

So, all we've done is made it so that steps one item forward (8 bytes in the case of a 64-bit array) along one axis, but also only steps 8 bytes forward along the other axis.

In other words, in case of a "window" size of 3, the array has a shape of (whatever, 3), but instead of stepping a full 3 * x.itemsize for the second dimension, it only steps one item forward, effectively making the rows of new array a "moving window" view into the original array.

(This also means that x.shape[0] * x.shape[1] will not be the same as x.size for your new array.)

At any rate, hopefully that makes things slightly clearer..

Kinggton: I really admire your answer, but don't you think it's quite a overkill to OP's question? ;-). Thanks — eat, Feb 07 '11 at 18:49
@eat - It is! :) It's definitely overkill for a short array (and the OP's 10K-element array is fairly short), but it's still useful to know about. Honestly, I just think I like writing overly-long answers sometimes... — Joe Kington, Feb 07 '11 at 19:23
Kingston: Thanks for a really detailed answer, I learnt a lot there. I also benched your code against @eumiro's answer and your rolling answer gave me 60x speedup! Considering I plan to use this on a much larger array the speedup is incredibly useful. :) — wxbx, Feb 08 '11 at 02:09
Thank you for such a great answer! I had no idea there's things like thus under the numpy's hood. Definitely worth checking out - the `__array_interface__` is also really cool! — jolly, Jan 12 '17 at 09:56

Paul · Answer 2 · 2011-02-07T19:10:36.297

11

This solution is not efficiently implemented by a python loop since it comes with all kinds of type-checking best avoided when working with numpy arrays. If your array is exceptionally tall, you will notice a large speed up with this:

newshape = (4,3)
newstrides = (A.itemsize, A.itemsize)
B = numpy.lib.stride_tricks.as_strided(A, shape=newshape, strides=newstrides)

This gives a view of the array A. If you want a new array you can edit, do the same but with .copy() at the end.

Details on strides:

The newstrides tuple in this case will be (4,4) because the array has 4-byte items and you want to continue to step thru your data in single-item steps in the i-dimension. The second value '4' refers to the strides in the j-dimension (in a normal 4x4 array it would be 16). Because in this case you want to also also increment your read from the buffer in 4-byte steps in the j-dimension.

Joe give a nice, detailed description and makes things crystal-clear when he says that all this trick does is change strides and shape simultaneously.

edited Feb 07 '11 at 19:10

answered Feb 07 '11 at 17:11

Paul

42,322
15
106
123

1

+1 You beat me to it! I was in the middle of typing this up... I'll still post my answer, as it goes into a bit more detail. Also, your `strides=(4,4)` assumes that `A.itemsize` is 4 (i.e. 32-bit floats or ints). It's best to do `strides=(A.itemsize, A.itemsize)`. – Joe Kington Feb 07 '11 at 17:25
Can you point me towards the docs for this? I have never seen this function before... – Benjamin Feb 07 '11 at 17:32
1

Thanks Joe. I was looking for some on-line documentation to link to but not much out there! This was the best I could find: http://mentat.za.net/numpy/numpy_advanced_slides/ – Paul Feb 07 '11 at 17:32
Here we go: http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.strides.html – Paul Feb 07 '11 at 17:41
@Paul: That helps a little, but I still can`t find the docs for numpy.lib.stride_tricks.as_strided – Benjamin Feb 07 '11 at 17:46
@Benjamin I doubt there is a page dedicated to it as it is not a very high-level function. numpy.lib.stride_tricks.as_strided.__str__() shows that it only takes two arguments: shape and strides. Shape should be obvious. Understanding what strides are is the only tricky bit which is well-documented. – Paul Feb 07 '11 at 18:01
1

@Benjamin - It just sets the strides and shape of an array simultaneously. It's for cases where you need change both at the same time, but the new strides won't be compatible with the old shape and vice versa, so you can't just do `x.strides = new_strides` and `x.shape = new_shape`. – Joe Kington Feb 07 '11 at 18:02
@Paul: Not so much of explicit Python looping in any of the answers (sofar). Could you care to elaborate more what you mean by `as they rely on a python loop`? Thanks – eat Feb 07 '11 at 18:32
@eat: I wasn't referring to your answer, only the OP's and the selected answer. Both of which have a `for` in the code. (A list comprehension is still a python loop.) – Paul Feb 07 '11 at 18:45

JustInTime · Answer 3 · 2012-09-21T06:38:04.070

Just to further go with the answer of @Joe general

import numpy as np
def rolling(a, window):
    step = 2 
    shape = ( (a.size-window)/step + 1   , window)


    strides = (a.itemsize*step, a.itemsize)

    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

a = np.arange(10)

print rolling(a, 3)

which outputs:

[[0 1 2]
 [2 3 4]
 [4 5 6]
 [6 7 8]]

To generalize further for the 2d case,i.e use it for patch extraction from an image

def rolling2d(a,win_h,win_w,step_h,step_w):

    h,w = a.shape
    shape = ( ((h-win_h)/step_h + 1)  * ((w-win_w)/step_w + 1) , win_h , win_w)

    strides = (step_w*a.itemsize, h*a.itemsize,a.itemsize)


    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

a = np.arange(36).reshape(6,6)
print a
print rolling2d (a,3,3,2,2)

which outputs:

[[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]
 [12 13 14 15 16 17]
 [18 19 20 21 22 23]
 [24 25 26 27 28 29]
 [30 31 32 33 34 35]]
[[[ 0  1  2]
  [ 6  7  8]
  [12 13 14]]

 [[ 2  3  4]
  [ 8  9 10]
  [14 15 16]]

 [[ 4  5  6]
  [10 11 12]
  [16 17 18]]

 [[ 6  7  8]
  [12 13 14]
  [18 19 20]]]

Is it possible in the above example to not pull results that wrap around the right edge of the original array. for example, the third output `[4,5,6; 10,11,12; 16,17,18]` 'wraps' back around. For image processing I'd like to avoid this and simply skip to the next returned result. — Phil Glau, Sep 10 '15 at 06:21

score 2 · Answer 4 · answered Feb 07 '11 at 16:34

2

Which approach are you using?

import numpy as np
A = np.array([1,2,3,4,5,6,7,8,9,10])
width = 3

np.vstack([A[i:i-len(A)+width] for i in xrange(len(A)-width)])
# needs 26.3µs

np.vstack([A[i:i-width] for i in xrange(width)]).T
# needs 13.2µs

If your width is relatively low (3) and you have a big A (10000 elements), then the difference is even more important: 32.4ms for the first and 44µs for the second.

answered Feb 07 '11 at 16:34

eumiro

207,213
34
299
261

thanks! this is just what I needed! and yeah just cracked out numpy today so slowly learning. – wxbx Feb 07 '11 at 16:39

wsdzbm · Answer 5 · 2016-07-25T14:26:29.143

1

I'm using a more generalized function similar to that of @JustInTime but applicable to ndarray

def sliding_window(x, size, overlap=0):
    step = size - overlap # in npts
    nwin = (x.shape[-1]-size)//step + 1
    shape = x.shape[:-1] + (nwin, size)
    strides = x.strides[:-1] + (step*x.strides[-1], x.strides[-1])
    return stride_tricks.as_strided(x, shape=shape, strides=strides)

An example,

x = np.arange(10)
M.sliding_window(x, 5, 3)
Out[1]: 
array([[0, 1, 2, 3, 4],
       [2, 3, 4, 5, 6],
       [4, 5, 6, 7, 8]])


x = np.arange(10).reshape((2,5))
M.sliding_window(x, 3, 1)
Out[2]: 
array([[[0, 1, 2],
        [2, 3, 4]],

       [[5, 6, 7],
        [7, 8, 9]]])

edited Jul 25 '16 at 14:26

answered Jul 25 '16 at 14:19

wsdzbm

3,096
3
25
28

Thanks for sharing this. Note that this truncates the last row if the windows don't exactly match up (in your first example, it loses the row that should contain '9'). But If you change the "nwin" line to `nwin = int(np.ceil((x.shape[-1]-size)/step + 1))`, it would seem that it will then pad the result with zeros. (Kinda surprised this doesn't give some seg fault, but I guess it's built in to `stride_tricks`.) – sh37211 Nov 27 '18 at 19:02

score 1 · Answer 6 · answered Oct 25 '16 at 13:23

1

Take a look at: view_as_windows.

import numpy as np
from skimage.util.shape import view_as_windows
window_shape = (4, )
aa = np.arange(1000000000) # 1 billion
bb = view_as_windows(aa, window_shape)

Around 1 second.

answered Oct 25 '16 at 13:23

Gilberto

813
7
17

Benjamin · Answer 7 · 2011-02-07T17:44:03.507

I think this might be faster than looping, when the width is fixed at a low number...

import numpy
a = numpy.array([1,2,3,4,5,6])
b = numpy.reshape(a, (numpy.shape(a)[0],1))
b = numpy.concatenate((b, numpy.roll(b,-1,0), numpy.roll(b,-2,0)), 1)
b = b[0:(numpy.shape(a)[0]/2) + 1,:]

EDIT Clearly, the solutions using strides are superior to this, with the only major disadvantage being that they are not yet well documented...

Efficient Numpy 2D array construction from 1D array

7 Answers7

Linked

Related