I have a pandas dataframe df
:
Date Activity Vector
0 2017-03-01T15:20:00 [0.0366666666667, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0...
1 2017-03-01T15:25:00 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
2 2017-03-01T15:45:00 [0.163333333333, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,...
3 2017-03-01T15:50:00 [0.316666666667, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,...
4 2017-03-01T15:55:00 [0.0666666666667, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0...
5 2017-03-01T16:00:00 [0.123333333333, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,...
6 2017-03-01T16:05:00 [0.0333333333333, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0...
7 2017-03-01T16:10:00 [0.356666666667, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,...
8 2017-03-01T16:15:00 [0.476666666667, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,...
9 2017-03-01T16:20:00 [0.113333333333, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,...
10 2017-03-01T16:50:00 [0.0733333333333, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0...
This data is a time series with some missing values (note, the Date
column has type str
).
I would like to reindex this dataframe and fill the missing entries with a numpy vector of zeros, np.zeros(15)
I've tried the following:
df = data.clean_df[['Date', 'Activity Vector']]
df['timestamp'] = pd.to_datetime(df['Date'])
# print(df.dtypes)
df = df.set_index('timestamp').resample('300S').ffill()
which gives me the following:
timestamp Date Activity Vector
0 2017-03-01 15:20:00 2017-03-01T15:20:00 [0.0366666666667, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0...
1 2017-03-01 15:25:00 2017-03-01T15:25:00 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
2 2017-03-01 15:30:00 2017-03-01T15:25:00 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
3 2017-03-01 15:35:00 2017-03-01T15:25:00 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
4 2017-03-01 15:40:00 2017-03-01T15:25:00 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
5 2017-03-01 15:45:00 2017-03-01T15:45:00 [0.163333333333, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,...
6 2017-03-01 15:50:00 2017-03-01T15:50:00 [0.316666666667, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,...
7 2017-03-01 15:55:00 2017-03-01T15:55:00 [0.0666666666667, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0...
8 2017-03-01 16:00:00 2017-03-01T16:00:00 [0.123333333333, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,...
9 2017-03-01 16:05:00 2017-03-01T16:05:00 [0.0333333333333, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0...
10 2017-03-01 16:10:00 2017-03-01T16:10:00 [0.356666666667, 0.0, 0.0, 0.0,
However this fills the missing samples with the previous entry via the ffill
, how can I instead fill the new rows with custom entries, for example with Date
being anything (doesn't matter as it will be dropped later) but Activity Vector
being filled with a numpy vector of zeros, np.zeros(15)