0

I hope that you can assist me with the following question on Python data post-processing with a taste of statistics.

Background:

  • I have over 600 .csv files which contain temperature data vs. time, measured from one location.
  • 300 of the files are Sensor Supplier A (red), and 300 of the files are Sensor Supplier B (blue).
  • Sensor A (red) is my reference, and Sensor B (blue) is a new sensor I need to assess.
  • I would like to answer the question, "How similar are Sensor B's temperature readings to Sensor A's".
  • The temperature data follows a similar pattern as shown below, but each dataset doesn't have perfect alignment to a starting time / stopping time for the temperature rise. The key data for me to compare is the "flat" area of the graphs; as the sides are just starting and stopping temperature which are lots of noise.
  • Normally, I have only a few data-sets to analyze, so I can do the averaging manually ... but the number of datasets I have makes this a bit more challenging!
  • Note: I already have a script written that allows me to read in all of my .csv files and write the data onto a new .csv file with all of the datapoints.

Where I'm stuck:

  • Stats Question: I don't really know what kind of statistical method to use on this time series data; in order to compare the similarity between Sensor A and Sensor B. (Note: I'm using Python to analyze my data).
  • I was thinking of trying like an *ANOVA with Repeated Measures*?
    • I'm not sure if this is the best option, because if I understand well, I'd need to take a mean because I have more than a single observation for each "subject".

    • Python Question: If I'm using Python's AnovaRM, I need an aggregate_func (like taking the mean), but I think this would be difficult to apply consistently to each one of my 100 data sets. The reason being, sometimes the stable data occurs starting after 100 seconds, sometimes after 50, etc...

    • So it makes it challenging for me to apply a blanket time period to start / stop the average.

Thanks in advance for any advice this community can offer!

enter image description here

Gary
  • 2,137
  • 3
  • 23
  • 41
  • 1
    When you have 2 signals, which (allegedly) are the same but at different points in time, one typically applies cross correlation to determine the time shift (or delay) between the two signals. This other SO post describes what and how: https://stackoverflow.com/questions/41492882/ . You may use the result (the time shift) to better align your empirical data in time... – Wololo Dec 07 '20 at 12:55
  • 1
    ... as for your other question: in statistics, the "Kolmogorov-Smirnov Test" is a method that can be used for determining wether two sets of empirical data are drawn from the same distribution or not. Here is a nice MIT lecture on the usbject: http://www.mit.edu/~6.s085/notes/lecture5.pdf .. and behold! There is actually an implementation of the method in scipy: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kstest.html – Wololo Dec 07 '20 at 12:56

0 Answers0