I have a DateTime Index in my DataFrame with multiple columns. As shown:
data_1 data_2
time
2020-01-01 00:23:40 330.98 NaN
2020-01-01 00:23:50 734.52 NaN
2020-01-03 00:00:00 388.06 23.9
2020-01-03 00:00:10 341.60 25.1
2020-01-03 00:00:20 395.14 24.9
...
2020-01-03 00:01:10 341.60 25.1
2020-01-03 00:01:20 395.14 24.9
I want to apply a function on rolling window (It has to be datetime, as i may have missed data, and this one is not my case) and collect some features. Features depend on multiple columns. I wrote my own class:
class FeatureCollector:
def __init__(self):
self.feature_dicts = []
def collect(self, window):
self.feature_dicts.append(extract_features(window))
return 1
def extract_features(window):
ans = {}
# do_smth_on_window and calculate ans
return ans
I run my roll as follows
collector = FeatureCollector()
my_df.rolling(timed(seconds=100), min_periods=10).apply(collector.collect)
features = collector.feature_dicts
But the problem is that extract_features may get only Series object, as I understood. My columns data_1 and data_2 will be passed there in turn as it is such a DataFrame:
data
time
2020-01-01 00:23:40 330.98
2020-01-01 00:23:50 734.52
2020-01-03 00:00:00 388.06
2020-01-03 00:00:10 341.60
2020-01-03 00:00:20 395.14
...
2020-01-03 00:01:10 341.60
2020-01-03 00:01:20 395.14
2020-01-01 00:23:40 NaN
2020-01-01 00:23:50 NaN
2020-01-03 00:00:00 23.9
2020-01-03 00:00:10 25.1
2020-01-03 00:00:20 24.9
...
2020-01-03 00:01:10 25.1
2020-01-03 00:01:20 24.9
How can I organize it in such a way that one window passed to extract_features would be a DataFrame with two columns?