I'm trying to bin (downsample) a time series based on its timestamps. For instance:
import numpy as np
import pandas as pd
timestamps = np.linspace(0, 1000, 10000)
values = np.random.random(10000)
I usually convert it to a dataframe, and use cut (or qcut) to create the bins:
timeseries_df = pd.DataFrame({"Timestamps": timestamps, "Values": values})
timeseries_df["Bins"] = pd.cut(timeseries_df["Timestamps"],100) #downsampling by two orders of magnitude
ds_timestamps = timeseries_df.groupby("Bins").max()["Timestamps"]
ds_values = timeseries_df.groupby("Bins").mean()["Values"]
This works, but I'm writing functions that I can reuse and I'd like to avoid using pandas if possible. I've tried implementing a version of what's been suggested here
ds_timestamps = np.linspace(timestamps.min(), timestamps.max(), 100)
digitized_timestamps = np.digitize(timestamps, ds_timestamps)
ds_values = [values[digitized_timestamps == i+1].mean() for i in range(len(ds_timestamps))]
This also works but is extremely slow. Is there another way of doing this?