2

Recently I got a dataset that has continuous data from day 1 to day k, now I'm trying to use Pandas to create a new dataset from this, apply a time window of size n+1 (that takes the first n rows of data as 'x' and the _n+1_th row of data as 'y') to roll from top to down of the dataset.

I've tried using the apply() function but it only takes one row of data at a time, since for my problem I want to take also the following n rows of data as input, so I wrote a for loop but it is time-consuming.

n = 3

x = []
y = []
for idx in devices.index:
    x.append(devices.iloc[idx:idx + n, 4:49].values)
    y.append(devices.iloc[idx:idx + n + 1, 'ALARM'].values)
    if idx + n + 1 == devices.index[-1]:
        break

I was wondering if there's an alternative way to replace the for-loop? Thanks for any help!

Kyle Sun
  • 21
  • 1
  • See the answer [here](https://stackoverflow.com/questions/49838315/python-pandas-apply-a-function-to-dataframe-rolling) – roganjosh Jun 07 '19 at 16:09
  • @roganjosh IIUC, this question is different. OP asks to *sample* rolling `n` rows, not applying a function on those. – Quang Hoang Jun 07 '19 at 16:15
  • @QuangHoang it looks like it's just a case of 2 rolling windows? I only linked because I though it would be a starting point; I would have duped if I thought it solved the whole problem :) – roganjosh Jun 07 '19 at 16:18
  • @roganjosh rolling only works if you want output being a number, not the data itself. – Quang Hoang Jun 07 '19 at 16:20
  • related [this question](https://stackoverflow.com/questions/40084931/taking-subarrays-from-numpy-array-with-given-stride-stepsize/40085052#40085052) – Quang Hoang Jun 07 '19 at 16:21

0 Answers0