5

I'm trying to reshape a numpy array using numpy.strided_tricks. This is the guide I'm following: https://stackoverflow.com/a/2487551/4909087

My use case is very similar, with the difference being that I need strides of 3.

Given this array:

a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])

I'd like to get:

array([[1, 2, 3],
       [2, 3, 4],
       [3, 4, 5],
       [4, 5, 6],
       [5, 6, 7],
       [6, 7, 8],
       [7, 8, 9]])

Here's what I tried:

import numpy as np

as_strided = np.lib.stride_tricks.as_strided
a = np.arange(1, 10)

as_strided(a, (len(a) - 2, 3), (3, 3))

array([[                 1,      2199023255552,             131072],
       [     2199023255552,             131072, 216172782113783808],
       [            131072, 216172782113783808,        12884901888],
       [216172782113783808,        12884901888,                768],
       [       12884901888,                768,   1125899906842624],
       [               768,   1125899906842624,           67108864],
       [  1125899906842624,           67108864,                  4]])

I was pretty sure I'd followed the example to a T, but evidently not. Where am I going wrong?

cs95
  • 379,657
  • 97
  • 704
  • 746
  • Why do you think you need strides of 3? – user2357112 Nov 25 '17 at 07:27
  • @user2357112 I don't know... I thought that's how I need to stride, based on the example given. – cs95 Nov 25 '17 at 07:32
  • Looks like that example is hardcoding a stride of 4 for 4-byte integers - not a good idea, considering their input could easily be 8-byte on a different OS. I'm going to edit that. – user2357112 Nov 25 '17 at 07:34
  • 1
    `as_strided` allows you to access bytes outside of the array's databuffer. It does not check that strides and shape are valid. Use with caution. – hpaulj Nov 25 '17 at 08:48

3 Answers3

8

The accepted answer (and discussion) is good, but for the benefit of readers who don't want to run their own test case, I'll try to illustrate what's going on:

In [374]: a = np.arange(1,10)
In [375]: as_strided = np.lib.stride_tricks.as_strided

In [376]: a.shape
Out[376]: (9,)
In [377]: a.strides 
Out[377]: (4,)

For a contiguous 1d array, strides is the size of the element, here 4 bytes, an int32. To go from one element to the next it steps forward 4 bytes.

What the OP tried:

In [380]: as_strided(a, shape=(7,3), strides=(3,3))
Out[380]: 
array([[        1,       512,    196608],
       [      512,    196608,  67108864],
       [   196608,  67108864,         4],
       [ 67108864,         4,      1280],
       [        4,      1280,    393216],
       [     1280,    393216, 117440512],
       [   393216, 117440512,         7]])

This is stepping by 3 bytes, crossing int32 boundaries, and giving mostly unintelligable numbers. If might make more sense if the dtype had been bytes or uint8.

Instead using a.strides*2 (tuple replication), or (4,4) we get the desired array:

In [381]: as_strided(a, shape=(7,3), strides=(4,4))
Out[381]: 
array([[1, 2, 3],
       [2, 3, 4],
       [3, 4, 5],
       [4, 5, 6],
       [5, 6, 7],
       [6, 7, 8],
       [7, 8, 9]])

Columns and rows both step one element, resulting in a 1 step moving window. We could have also set shape=(3,7), 3 windows 7 elements long.

In [382]: _.strides
Out[382]: (4, 4)

Changing strides to (8,4) steps 2 elements for each window.

In [383]: as_strided(a, shape=(7,3), strides=(8,4))
Out[383]: 
array([[          1,           2,           3],
       [          3,           4,           5],
       [          5,           6,           7],
       [          7,           8,           9],
       [          9,          25, -1316948568],
       [-1316948568,   184787224, -1420192452],
       [-1420192452,           0,           0]])

But shape is off, showing us bytes off the end of the original databuffer. That could be dangerous (we don't know if those bytes belong to some other object or array). With this size of array we don't get a full set of 2 step windows.

Now step 3 elements for each row (3*4, 4):

In [384]: as_strided(a, shape=(3,3), strides=(12,4))
Out[384]: 
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])
In [385]: a.reshape(3,3).strides
Out[385]: (12, 4)

This is the same shape and strides as a 3x3 reshape.

We can set negative stride values and 0 values. In fact, negative-step slicing along a dimension with a positive stride will give a negative stride, and broadcasting works by setting 0 strides:

In [399]: np.broadcast_to(a, (2,9))
Out[399]: 
array([[1, 2, 3, 4, 5, 6, 7, 8, 9],
       [1, 2, 3, 4, 5, 6, 7, 8, 9]])
In [400]: _.strides
Out[400]: (0, 4)

In [401]: a.reshape(3,3)[::-1,:]
Out[401]: 
array([[7, 8, 9],
       [4, 5, 6],
       [1, 2, 3]])
In [402]: _.strides
Out[402]: (-12, 4)

However, negative strides require adjusting which element of the original array is the first element of the view, and as_strided has no parameter for that.

user2357112
  • 260,549
  • 28
  • 431
  • 505
hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • "For a 1d array, strides is the size of the element" - only for a contiguous array. It's generally a bad idea to assume contiguity unless you know exactly how your input was produced or you checked contiguity yourself. – user2357112 Nov 25 '17 at 19:21
  • Awesome. Thank you so much. Really informative and helpful. From the examples provided, it is easier to discern a pattern here. – cs95 Nov 25 '17 at 19:21
  • As for negative strides, it might be easier to go directly through the [`ndarray` constructor](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.ndarray.html), so you can provide `offset` as well as `shape` and `strides`. – user2357112 Nov 25 '17 at 19:24
2

I have no idea why you think you need strides of 3. You need strides the distance in bytes between one element of a and the next, which you can get using a.strides:

as_strided(a, (len(a) - 2, 3), a.strides*2)
user2357112
  • 260,549
  • 28
  • 431
  • 505
  • Thanks, that was simple. I didn't really know what went in `strides`, and I thought it was 3, because in the link, they used 4. I read the docs but didn't understand much honestly. – cs95 Nov 25 '17 at 07:33
  • It says `Tuple of bytes to step in each dimension when traversing an array.` @user2357112 - care to add some explanation? – Vivek Kalyanarangan Nov 25 '17 at 07:36
  • Out of curiosity, if I wanted to stride by two `([1, 2, 3], [3, 4, 5], ...)`, I'd need `a.strides * 3`? – cs95 Nov 25 '17 at 07:51
  • @cᴏʟᴅsᴘᴇᴇᴅ: I think you're misunderstanding what the strides of an array are. An array's strides tell you how many bytes you have to step in memory to move from one array element to the next in any dimension. See https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.ndarray.strides.html#numpy.ndarray.strides and https://docs.scipy.org/doc/numpy-1.13.0/reference/internals.html#internal-organization-of-numpy-arrays – user2357112 Nov 25 '17 at 07:56
  • Thanks for the help. Figured it out: `as_strided(a, (len(a) - 5, 3), (a.strides[0] *2 , a.strides[0] ))` it feels a lot like strides in convolution, which I'm used to working with, so I was able to pick it up fairly quick. – cs95 Nov 25 '17 at 08:03
  • 1
    @cᴏʟᴅsᴘᴇᴇᴅ: `len(a) - 5` doesn't look right - the right expression should be something with a `// 2` in it. Other than that, it looks like you have things down. – user2357112 Nov 25 '17 at 08:06
2

I was trying to do a similar operation and run into the same problem.

In your case, as stated in this comment, the problems were:

  1. You were not taking into account the size of your element when stored in memory (int32 = 4, which can be checked using a.dtype.itemsize).
  2. You didn't specify appropriately the number of strides you had to skip, which in your case were also 4, as you were skipping only one element.

I made myself a function based on this answer, in which I compute the segmentation of a given array, using a window of n-elements and specifying the number of elements to overlap (given by window - number_of_elements_to_skip).

I share it here in case someone else needs it, since it took me a while to figure out how stride_tricks work:

def window_signal(signal, window, overlap):
    """ 
    Windowing function for data segmentation.

    Parameters:
    ------------
    signal: ndarray
            The signal to segment.
    window: int
            Window length, in samples.
    overlap: int
             Number of samples to overlap

    Returns: 
    --------
    nd-array 
            A copy of the signal array with shape (rows, window),
            where row = (N-window)//(window-overlap) + 1
    """
    N = signal.reshape(-1).shape[0] 
    if (window == overlap):
        rows = N//window
        overlap = 0
    else:
        rows = (N-window)//(window-overlap) + 1
        miss = (N-window)%(window-overlap)
        if(miss != 0):
            print('Windowing led to the loss of ', miss, ' samples.')
    item_size = signal.dtype.itemsize 
    strides = (window - overlap) * item_size
    return np.lib.stride_tricks.as_strided(signal, shape=(rows, window),
                                           strides=(strides, item_size))

The solution for this case is, according to your code: as_strided(a, (len(a) - 2, 3), (4, 4))

Alternatively, using the function window_signal:

window_signal(a, 3, 2)

Both return as output the following array:

array([[1, 2, 3],
   [2, 3, 4],
   [3, 4, 5],
   [4, 5, 6],
   [5, 6, 7],
   [6, 7, 8],
   [7, 8, 9]])
cs95
  • 379,657
  • 97
  • 704
  • 746
Tesla
  • 53
  • 1
  • 8