1

I have DataFrame and I'd like to make the sub sequences of the its data

d = pd.DataFrame({'t' : [1,2,3,4,5,6]})

x = []
window = 3
for i in range(0, len(d) - window + 1):
    x.append(d[i: i + window].t.values)
    
pd.DataFrame(x, columns = ['t1','t2', 't3'])

I receive the result like this:

    t1  t2  t3
0   1   2   3
1   2   3   4
2   3   4   5
3   4   5   6

It works but very slow for large DataFrame. Is it possible to make the procedure faster?

Roman Kazmin
  • 931
  • 6
  • 18

2 Answers2

3

You can use numpy so long as your version is > 1.20

import pandas as pd
from numpy.lib.stride_tricks import sliding_window_view

W = 3
pd.DataFrame(sliding_window_view(d['t'], W), 
             columns=[f't{i+1}' for i in range(W)])

#   t1  t2  t3
#0   1   2   3
#1   2   3   4
#2   3   4   5
#3   4   5   6
ALollz
  • 57,915
  • 7
  • 66
  • 89
  • This is very fast!! – Riccardo Bucco Nov 15 '21 at 22:34
  • 1
    I think that faster than this is impossible! But this requires `numpy` version `> 1.20`. For older versions, one can use the `rolling_window` function defined [here](https://stackoverflow.com/a/6811241/17120692) – Rodalm Nov 15 '21 at 22:43
1

You can use this trick with Pandas:

lst = []
df.rolling(3).apply(lambda x: lst.append(x.apply(int).tolist()) or 0)
result = pd.DataFrame.from_records(lst, columns=['t1','t2','t3'])

Here is the result:

   t1  t2  t3
0   1   2   3
1   2   3   4
2   3   4   5
3   4   5   6
Riccardo Bucco
  • 13,980
  • 4
  • 22
  • 50