2

I want to create a time lagged pandas data frame from a pandas series.

Given pandas series:

X = pd.Series(range(5))

Expected output:

    0   1   2
0   0   1   2.0
1   1   2   3.0
2   2   3   4.0
3   3   4   0.0

I have implemented the following function (with step size), but it takes a long time on large data-sets.

def creat_time_lagged(x, shift, step):
    df = pd.DataFrame()
    for i in range(0, len(x), step):
        if i + shift - 1 < len(x):
            df['{}'.format(i)] = x.iloc[i : i + shift].values
        else:
            df['{}'.format(i)] = np.append(x.iloc[i:].values, np.zeros(shift - len(x.iloc[i:].values)))
            break
    return df

How can I improve it?

  • There's Pandas `DataFrame.shift` function https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.shift.html That should do what you're after – Andrew Dec 02 '18 at 15:26

1 Answers1

2

Approach #1

We can leverage np.lib.stride_tricks.as_strided based scikit-image's view_as_windows to get sliding windows. More info on use of as_strided based view_as_windows.

from skimage.util.shape import view_as_windows

def create_time_lagged_viewaswindows(X, shift, step):  
    a_ext = np.r_[X.values,np.zeros(shift-1,dtype=X.dtype)]
    windows_ar = view_as_windows(a_ext,shift)[:len(X)-shift+step+1:step].T
    return pd.DataFrame(windows_ar)

Bit of explanation : The basic idea is we pad on the trailing side with zeros and then create sliding windows. To create the windows, we make use of np.lib.stride_tricks.as_strided or skimage.util.view_as_windows.

Sample runs -

In [166]: X = pd.Series(range(5))

In [167]: create_time_lagged_viewaswindows(X, shift=4, step=1)
Out[167]: 
   0  1  2
0  0  1  2
1  1  2  3
2  2  3  4
3  3  4  0

In [168]: create_time_lagged_viewaswindows(X, shift=4, step=2)
Out[168]: 
   0  1
0  0  2
1  1  3
2  2  4
3  3  0

Approach #2

We can also make use of np.lib.stride_tricks.as_strided that would require us to manually setup the strides and shape arg with it, but we would avoid the transpose as used with earlier approach and that might be worth the extra performance boost. The implementation would look something along these lines -

def create_time_lagged_asstrided(X, shift, step):  
    a_ext = np.r_[X.values,np.zeros(shift-1,dtype=X.dtype)]
    strided = np.lib.stride_tricks.as_strided
    s = a_ext.strides[0]
    ncols = (len(X)-shift+2*step)//step
    windows_ar = strided(a_ext, shape=(shift,ncols), strides=(s,step*s))
    return pd.DataFrame(windows_ar)

Timings on large array -

In [215]: X = pd.Series(range(10000))

# Original solution
In [216]: %timeit creat_time_lagged(X, shift=10, step=5)
1 loop, best of 3: 608 ms per loop

# Approach #1
In [217]: %timeit create_time_lagged_viewaswindows(X, shift=10, step=5)
10000 loops, best of 3: 146 µs per loop

# Approach #2
In [218]: %timeit create_time_lagged_asstrided(X, shift=10, step=5)
10000 loops, best of 3: 104 µs per loop
Divakar
  • 218,885
  • 19
  • 262
  • 358