1

I have a some large datasets of sensor values consisting of a single sensor value sampled at a one-minute interval, like a waveform. The total dataset spans a few years.

I wish to (using python) enter/select a arbitrary set of sensor data (for instance consisting of 600 values, so for 10hrs worth of data) and find all similar time stamps where roughly the same shape occurred in these datasets.

The matches should be made by shape (relative differences), not by actual values, as there are different sensors used with different biases and environments. Also, I wish to retrieve multiple matches within a single dataset, to further analyse.

I’ve been looking into pandas, but I’m stuck at the moment... any guru here?

JvdBosch
  • 11
  • 1
  • 1
    There are definitely capable people here who can help you, but you need to provide some useful data we can work with. Check this [How to make good reproducible pandas](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – Danail Petrov Jan 03 '21 at 11:07

1 Answers1

0

I don't know much about the functionalities available in Pandas.

I think you need to first decide the typical time span T over which the correlation is supposed to occurred. What I would do is to split all your times series into (possibly overlapping) segments of duration T using Numpy (see here for instance). This will lead to a long list of segments. I would then compute the correlation between all pairs of segments using e.g. corrcoef. You get a large correlation matrix where you can spot the pairs of similar segments by applying a threshold on the absolute value of the correlation. You can estimate the correct threshold by applying this algorithm to a data set where you don't expect any correlation, or by randomizing your data.

Dharman
  • 30,962
  • 25
  • 85
  • 135