3

consider the array a

import numpy as np
import pandas as pd

np.random.seed([3,1415])
a = np.random.randint(100, size=10)
print(a)

[11 98 74 90 15 55 13 11 13 26]

I'm using as_strided from numpy.lib.stride_tricks import as_strided

When I use this to give a rolling window as follows

as_strided(a, shape=(len(a), 5), strides=(8, -8))

[[11  0  0  0  0]
 [98 11  0  0  0]
 [74 98 11  0  0]
 [90 74 98 11  0]
 [15 90 74 98 11]
 [55 15 90 74 98]
 [13 55 15 90 74]
 [11 13 55 15 90]
 [13 11 13 55 15]
 [26 13 11 13 55]]

This is almost perfect. I want to have np.nan in that top triangle instead of zeros.

I want this

[[ 11.  nan  nan  nan  nan]
 [ 98.  11.  nan  nan  nan]
 [ 74.  98.  11.  nan  nan]
 [ 90.  74.  98.  11.  nan]
 [ 15.  90.  74.  98.  11.]
 [ 55.  15.  90.  74.  98.]
 [ 13.  55.  15.  90.  74.]
 [ 11.  13.  55.  15.  90.]
 [ 13.  11.  13.  55.  15.]
 [ 26.  13.  11.  13.  55.]]

Is there a convenient way to tell as_strided to fill those in with np.nan instead

Divakar
  • 218,885
  • 19
  • 262
  • 358
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • 1
    Those zeros you are getting are outside of `a's` memory space and as such could be anything. It just so happens to be zeros in your case. Also relevant to this one - http://stackoverflow.com/questions/40683601 – Divakar Jan 07 '17 at 11:04
  • @Divakar scary. good thing I asked. :-) – piRSquared Jan 07 '17 at 11:04

1 Answers1

3

The trick is to prepend NaNs and then stride it. There could be two ways to stride - forward and backward by using appropriate strides. The way the desired output is set, we need to stride backwards along each row. An alternative method would be to stride forward, get the 2D output and finally flip the columns, though it would be a bit slower. So, using the forward method, we would have as usual a positive stride along each row and with backward striding one a negative stride.

Thus, the two approaches with strides would be -

from numpy.lib.stride_tricks import as_strided as strided 

def strided_nan_filled(a, W):
    a_ext = np.concatenate(( np.full(W-1,np.nan) ,a))
    n = a_ext.strides[0]
    return strided(a_ext, shape=(a.size,W), strides=(n,n))[:,::-1]

def strided_nan_filled_v2(a, W):
    a_ext = np.concatenate(( np.full(W-1,np.nan) ,a))
    n = a_ext.strides[0]
    return strided(a_ext[W-1:], shape=(a.size,W), strides=(n,-n))

Sample run -

In [42]: a
Out[42]: array([11, 98, 74, 90, 15, 55, 13, 11, 13, 26])

In [43]: strided_nan_filled(a, 5)
Out[43]: 
array([[ 11.,  nan,  nan,  nan,  nan],
       [ 98.,  11.,  nan,  nan,  nan],
       [ 74.,  98.,  11.,  nan,  nan],
       [ 90.,  74.,  98.,  11.,  nan],
       [ 15.,  90.,  74.,  98.,  11.],
       [ 55.,  15.,  90.,  74.,  98.],
       [ 13.,  55.,  15.,  90.,  74.],
       [ 11.,  13.,  55.,  15.,  90.],
       [ 13.,  11.,  13.,  55.,  15.],
       [ 26.,  13.,  11.,  13.,  55.]])

Runtime test -

In [74]: a = np.random.randint(0,9,(1000))

In [75]: %timeit strided_nan_filled(a, 5)
10000 loops, best of 3: 30.1 µs per loop

In [76]: %timeit strided_nan_filled_v2(a, 5)
10000 loops, best of 3: 28.7 µs per loop
Divakar
  • 218,885
  • 19
  • 262
  • 358