I have a DataFrame representing a time series of feature values at second intervals
>>> df
FeatA FeatB FeatC
0 1 6 11
1 2 7 12
2 3 8 13
3 4 9 14
4 5 10 15
I want to use this to construct a training data set for a Scikit-Learn model. In each row I want to add the feature values for the previous 15 minutes (900 rows) in the following format
>>> df
FeatA FeatB FeatC FeatA_T1 FeatB_T1 FeatC_T1 FeatA_T2 FeatB_T2 ...
0 1 6 11 NaN NaN NaN NaN NaN
1 2 7 12 1 6 11 NaN NaN
2 3 8 13 2 7 12 1 6
3 4 9 14 3 8 13 2 7
4 5 10 15 4 9 14 3 8
Currently the code I use is essentially
for i in range(1, 900):
for feature in ["FeatA", "FeatB", "FeatC"]:
df[f"{feature}_T{i}"] = df[feature].shift(i)
There are 23,400 rows in the original DataFrame and 137 features so this method is inefficient and unusable. Since this is being fed to Scikit-Learn, the final data needs to be a numpy array (shown below). I'm fairly certain it would be faster to do these manipulations in numpy instead of pandas, however all the examples I've found use the pandas shift function.
How can I construct this dataset efficiently from the original DataFrame? Are numpy array functions the way to go?
Expected Result
array([[ 1., 6., 11., nan, nan, nan, nan, nan, nan],
[ 2., 7., 12., 1., 6., 11., nan, nan, nan],
[ 3., 8., 13., 2., 7., 12., 1., 6., 11.],
[ 4., 9., 14., 3., 8., 13., 2., 7., 12.],
[ 5., 10., 15., 4., 9., 14., 3., 8., 13.]])
N.B. Ultimately I plan to slice off the first 900 and last 900 rows of the array so any result that doesn't include them would work.