Fast method for generation of the sub sequences initial data

Question

I have DataFrame and I'd like to make the sub sequences of the its data

d = pd.DataFrame({'t' : [1,2,3,4,5,6]})

x = []
window = 3
for i in range(0, len(d) - window + 1):
    x.append(d[i: i + window].t.values)
    
pd.DataFrame(x, columns = ['t1','t2', 't3'])

I receive the result like this:

    t1  t2  t3
0   1   2   3
1   2   3   4
2   3   4   5
3   4   5   6

It works but very slow for large DataFrame. Is it possible to make the procedure faster?

how big Dataframe are we talking about? I want to benchmark the solutions I can come up with, so I want to know with what size I should test them — Tugay, Nov 15 '21 at 22:04

ALollz · Answer 1 · 2021-11-16T00:36:20.973

3

You can use numpy so long as your version is > 1.20

import pandas as pd
from numpy.lib.stride_tricks import sliding_window_view

W = 3
pd.DataFrame(sliding_window_view(d['t'], W), 
             columns=[f't{i+1}' for i in range(W)])

#   t1  t2  t3
#0   1   2   3
#1   2   3   4
#2   3   4   5
#3   4   5   6

edited Nov 16 '21 at 00:36

answered Nov 15 '21 at 22:26

ALollz

57,915
7
66
89

This is very fast!! – Riccardo Bucco Nov 15 '21 at 22:34
1

I think that faster than this is impossible! But this requires `numpy` version `> 1.20`. For older versions, one can use the `rolling_window` function defined [here](https://stackoverflow.com/a/6811241/17120692) – Rodalm Nov 15 '21 at 22:43

score 1 · Answer 2 · answered Nov 15 '21 at 22:30

You can use this trick with Pandas:

lst = []
df.rolling(3).apply(lambda x: lst.append(x.apply(int).tolist()) or 0)
result = pd.DataFrame.from_records(lst, columns=['t1','t2','t3'])

Here is the result:

   t1  t2  t3
0   1   2   3
1   2   3   4
2   3   4   5
3   4   5   6

Fast method for generation of the sub sequences initial data

2 Answers2