New to pandas, and I'm trying to get a rolling mean with a fixed window size. But I have 2 lists which represent timestamp tuples and values. I'd like the former to be used as weight for the latter. I would also like to make sure that the gaps in data are identifiable (the timestamps are not necessarily continuous).
Example lists:
ts = [(1415969999, 1415970014), (1415970014, 1415970030), (1415970030, 1415970045), (1415970045, 1415970060), (1415970060, 1415970075), (1415970075, 1415970090), (1415970090, 1415970105), (1415970105, 1415970120), (1415970120, 1415970135), (1415970135, 1415970150), (1415970150, 1415970165), (1415970165, 1415970181), (1415970181, 1415970286), (1415970286, 1415970301), (1415970301, 1415970316)...]
values = [8.0, 13.0, 11.75, 7.0, 8.5, 16.0, 16.0, 6.5, 4.0, 8.25, 5.5, 1.0, 0.0, 0.5, 0.5, 0.0, 0.25, 0.0, 0.25, 0.0, 0.5, 0.0, 2.25, 0.0, 0.25, 0.0, 0.25, 0.0, 1.0, 0.25, 0.25, 0.0, 0.25, 0.0, 0.5, 0.25, 0.0, 1.0, 0.0, 0.5...]
What I'm using right now is:
pandas_series = pd.Series(values)
window_averages = pd.rolling_mean(pandas_series, window=90) # 90 would be seconds here
But that doesn't take into account the weights. I have had a look here and here but can't quite piece this together.
EDIT
I sort of managed to get what I wanted, but I don't think the solution is quite optimal. It shows pretty much the input I need at the bottom, and includes gaps in data (which I represented with -1 for now here)
import pandas as pd
data = [(1415970014, 1415970030, 13.0), (1415970033, 1415970048, 11.75), (1415970048, 1415970053, 3.2)]
start_range = data[0][0]
end_range = data[len(data)-1][1]-1
previous_end_time = start_range
values = []
for t in data:
start_ts, end_ts, value = t
empties = []
while start_ts > previous_end_time:
empties.append(previous_end_time)
values.append(-1)
previous_end_time += 1
window_length = end_ts-start_ts
values += [value]*window_length
previous_end_time = end_ts
s_range_datetime_start = pd.to_datetime(start_range, unit='s')
s_range_datetime_end = pd.to_datetime(end_range, unit='s')
period_range = pd.period_range(s_range_datetime_start, s_range_datetime_end, freq='s')
series = pd.Series(values, period_range)
print series
Which then yields the following, basically extrapolated the data the 1 seconds.
2014-11-14 13:00:14 13.00
2014-11-14 13:00:15 13.00
2014-11-14 13:00:16 13.00
2014-11-14 13:00:17 13.00
2014-11-14 13:00:18 13.00
2014-11-14 13:00:19 13.00
2014-11-14 13:00:20 13.00
2014-11-14 13:00:21 13.00
2014-11-14 13:00:22 13.00
2014-11-14 13:00:23 13.00
2014-11-14 13:00:24 13.00
2014-11-14 13:00:25 13.00
2014-11-14 13:00:26 13.00
2014-11-14 13:00:27 13.00
2014-11-14 13:00:28 13.00
2014-11-14 13:00:29 13.00
2014-11-14 13:00:30 -1.00
2014-11-14 13:00:31 -1.00
2014-11-14 13:00:32 -1.00
2014-11-14 13:00:33 11.75
2014-11-14 13:00:34 11.75
2014-11-14 13:00:35 11.75
2014-11-14 13:00:36 11.75
2014-11-14 13:00:37 11.75
2014-11-14 13:00:38 11.75
2014-11-14 13:00:39 11.75
2014-11-14 13:00:40 11.75
2014-11-14 13:00:41 11.75
2014-11-14 13:00:42 11.75
2014-11-14 13:00:43 11.75
2014-11-14 13:00:44 11.75
2014-11-14 13:00:45 11.75
2014-11-14 13:00:46 11.75
2014-11-14 13:00:47 11.75
2014-11-14 13:00:48 3.20
2014-11-14 13:00:49 3.20
2014-11-14 13:00:50 3.20
2014-11-14 13:00:51 3.20
2014-11-14 13:00:52 3.20
My idea was to then apply the rolling mean on this time period.