How to resample a time series in DataFrame so that each time series have the same number of rows?

Question

I have a time series in a DataFrame. The time series capture trajectories of the same path traversed, i.e. acceleration and rotation in x, y and z direction and a label (str). The timestamp is used to indicate the point in time in which values where observed. The problem know is that the timestamps just consist of seconds:

timestamp ...
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
2.0
2.0
2.0
2.0
2.0
...

Each recorded time series have around 10000-20000 such rows. I need them now to have the same dimensions, i.e. row size is the same for all time series. So, I thought about resampling: Every time series should have the average row size, in this case 15000. Each time series with less than 15000 rows should be upsampled, whereas time series with more than 15000 rows should be downsampled. In general, there are around 15-20 observations per second, i.e. a timestamp is repeated over 15-20 rows. How would I achieve this without losing too many information? Do you have any ideas? This step is needed for having the right format to train a specific network.

The solution really depends on the context of your problem. An easy solution is to take the average values in each time series using `groupby` and `agg` methods (you will do the grouping using the `timestamp` column). If you have an unequal number of time stamps across different series, for example, seconds 1-100 for one and seconds 1-120 for another, then you can just take the smallest number of timestamps that is common among every series (1-100 in this example). Again, the easy solution might not be appropriate for your context. — Shabbir Khan, Feb 26 '23 at 19:51
@ShabbirKhan The time series capture trajectories of the same path traversed, i.e. acceleration and rotation in x, y and z direction. Thank you for your input! — Unistack, Feb 26 '23 at 19:59
If you have ten samples with timestamp 1.0, does that mean that you have about one sample every 0.1 seconds, or does it mean that you could have nine samples in the first 0.3 seconds, and one sample 0.7 seconds later? How valid is the assumption that the samples are equally spaced within a second? — Nick ODell, Feb 26 '23 at 20:15
@NickODell Yes, you can for now assume that the samples are equally spaced within a second. — Unistack, Feb 26 '23 at 20:34
I'm not sure, but I think this [answer](https://stackoverflow.com/questions/15222754/groupby-pandas-dataframe-and-select-most-common-value) using *mode* could be a good start solution. If you think average over sequential time series can be the one you need: [post1](https://stackoverflow.com/questions/30328646/group-by-in-group-by-and-average) or [post2](https://stackoverflow.com/questions/41040132/pandas-groupby-count-and-mean-combined) — Mario, Feb 26 '23 at 22:29
The [resample_poly](https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.resample_poly.html#scipy.signal.resample_poly) function might be useful to you. In general, I would encourage you to research how signals of different lengths can be resampled to have the same length. — Shabbir Khan, Feb 27 '23 at 05:48

How to resample a time series in DataFrame so that each time series have the same number of rows?

0 Answers0