-1

To implement pytorch's DataSet class __get_item__() method, it requires to support the indexing such that dataset[i] can be used to get ith sample.

Say I have a time-series ser:

2017-12-29 14:44:00  69.90
2017-12-29 14:45:00  69.91
2017-12-29 14:46:00  69.87
2017-12-29 14:47:00  69.85
2017-12-29 14:48:00  69.86
2017-12-29 14:49:00  69.92
2017-12-29 14:50:00  69.90
2017-12-29 14:51:00  70.00
2017-12-29 14:52:00  69.97
2017-12-29 14:53:00  69.99
2017-12-29 14:54:00  69.99
2017-12-29 14:55:00  69.85 

Since I need to index into the rolling window. I generate a window length 3 time-series by using:

l3_list = list()
def t(x):
  l3_list.append(x.copy())
ser.rolling(3).apply(t)

l3_list becomes:

[array([69.9 , 69.91, 69.87]),
 array([69.91, 69.87, 69.85]),
 array([69.87, 69.85, 69.86]),
 array([69.85, 69.86, 69.92]),
 array([69.86, 69.92, 69.9 ]),
 array([69.92, 69.9 , 70.  ]),
 array([69.9 , 70.  , 69.97]),
 array([70.  , 69.97, 69.99]),
 array([69.97, 69.99, 69.99]),
 array([69.99, 69.99, 69.85])]

So that I can index in l3_list. Namely l3_list[i] is the ith sliding window. Is there a more memory efficient way to do this?

Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
spacegoing
  • 5,056
  • 5
  • 25
  • 42
  • I have limited knowledge of pandas, but why not `return ser[i:i+3].copy()` in `__getitem__`? (for why `copy` see [this answer](https://stackoverflow.com/questions/53652015/unexpected-increase-in-validation-error-in-mnist-pytorch/)) – Jatentaki Dec 15 '18 at 17:46

2 Answers2

0

You might add a column, somewhat like's been done here: Pandas rolling window to return an array

from io import StringIO

data = """
2017-12-29 14:44:00  69.90
2017-12-29 14:45:00  69.91
2017-12-29 14:46:00  69.87
2017-12-29 14:47:00  69.85
2017-12-29 14:48:00  69.86
2017-12-29 14:49:00  69.92
2017-12-29 14:50:00  69.90
2017-12-29 14:51:00  70.00
2017-12-29 14:52:00  69.97
2017-12-29 14:53:00  69.99
2017-12-29 14:54:00  69.99
2017-12-29 14:55:00  69.85 
"""

df = pd.read_csv(StringIO(data), sep='\s+', header = None)

stride = np.lib.stride_tricks.as_strided  
arr = stride(df[2], (len(df), 3), (df[2].values.strides * 2))
df['array'] = pd.Series(arr.tolist(), index=df.index[:])

             0         1      2                         array
0   2017-12-29  14:44:00  69.90          [69.9, 69.91, 69.87]
1   2017-12-29  14:45:00  69.91         [69.91, 69.87, 69.85]
2   2017-12-29  14:46:00  69.87         [69.87, 69.85, 69.86]
3   2017-12-29  14:47:00  69.85         [69.85, 69.86, 69.92]
4   2017-12-29  14:48:00  69.86          [69.86, 69.92, 69.9]
5   2017-12-29  14:49:00  69.92           [69.92, 69.9, 70.0]
6   2017-12-29  14:50:00  69.90           [69.9, 70.0, 69.97]
7   2017-12-29  14:51:00  70.00          [70.0, 69.97, 69.99]
8   2017-12-29  14:52:00  69.97         [69.97, 69.99, 69.99]
9   2017-12-29  14:53:00  69.99         [69.99, 69.99, 69.85]
10  2017-12-29  14:54:00  69.99     [69.99, 69.85, 5.53e-322]
11  2017-12-29  14:55:00  69.85  [69.85, 5.53e-322, 5.6e-322]
Zanshin
  • 1,262
  • 1
  • 14
  • 30
0

Here is another trick to getting a sliding window:

Setup:

d = {pd.Timestamp('2017-12-29 14:44:00'): 69.9,
 pd.Timestamp('2017-12-29 14:45:00'): 69.91,
 pd.Timestamp('2017-12-29 14:46:00'): 69.87,
 pd.Timestamp('2017-12-29 14:47:00'): 69.85,
 pd.Timestamp('2017-12-29 14:48:00'): 69.86,
 pd.Timestamp('2017-12-29 14:49:00'): 69.92,
 pd.Timestamp('2017-12-29 14:50:00'): 69.9,
 pd.Timestamp('2017-12-29 14:51:00'): 70.0,
 pd.Timestamp('2017-12-29 14:52:00'): 69.97,
 pd.Timestamp('2017-12-29 14:53:00'): 69.99,
 pd.Timestamp('2017-12-29 14:54:00'): 69.99,
 pd.Timestamp('2017-12-29 14:55:00'): 69.85}

ser = pd.Series(d)

Use empty list with rolling, apply with append:

lol = []
ser.rolling(3).apply((lambda x: lol.append(x.values) or 0), raw=False)
lol

Output:

[array([69.9 , 69.91, 69.87]),
 array([69.91, 69.87, 69.85]),
 array([69.87, 69.85, 69.86]),
 array([69.85, 69.86, 69.92]),
 array([69.86, 69.92, 69.9 ]),
 array([69.92, 69.9 , 70.  ]),
 array([69.9 , 70.  , 69.97]),
 array([70.  , 69.97, 69.99]),
 array([69.97, 69.99, 69.99]),
 array([69.99, 69.99, 69.85])]
Scott Boston
  • 147,308
  • 15
  • 139
  • 187