I have a pandas dataframe which has more than 5 millions rows. I try to slice it (time series to supervised frame) by using the function below. However it consumes too much ram and collaps during long sequences (look_back, forecast_horizon). As I know, lists are more compact than arrays, so I start with an empty list then convert it to np.array at the end. Is there a way to do this more compact? Assume sequence.shape is (5000000,20), X.shape is (~5000000, lookback, 20), y.shape is (~5000000, forecast_horizon)
def split_sequence(sequence, look_back, forecast_horizon):
X, y = list(),list()
for i in range(len(sequence)):
lag_end = i + look_back
forecast_end = lag_end + forecast_horizon
if forecast_end > len(sequence):
break
seq_x, seq_y = sequence[i:lag_end], sequence['first column'][lag_end:forecast_end]
X.append(seq_x)
y.append(seq_y)
return np.array(X), np.array(y)