Given a set of 24 hour signal with hourly data points which represent energy consumption patterns, provide each with a similarity score? The peaks vary in their height, width and placement in the signal. The aim is to rank the signals such that those with closer scores are more similar than those with further scores (like how age or income works). I.e., there should be a lower gap between these two signals' scores than these.
Finding the correlation between each one with a base case was not adequate since if one signal had a high peak in the morning and a low one in the afternoon, a signal with peaks in the opposite pattern would be classified as similar to the first signal. Returns correlation was also not suitable. The same issue was produced when using RMSE between signals and a base case.
After some thought, I attempted to find the peaks of a signal and then score a peak in the following way:
public double score(){
int b1 = max-start;
int b2 = end - max;
double h1 = maxHeight-startHeight;
double h2 = maxHeight - endHeight;
double a1 = 0.5*h1*b1;
double a2 = 0.5*h2*b2;
return Math.sqrt(Math.pow(h1,2)+ Math.pow(h2,2));
}
Where start, max, and end represent the start, max, and end times of the peak, respectively.
I think this could be a working method; however, I'm having difficulty finding the peaks themselves. All the methods I've tried have some flaws.
I have tried the method in this post: Peak signal detection in realtime timeseries data Some peaks were defined as starting too early. Since some peaks could persist for several hours, I tried making lag longer. However, if the lag was too long, peaks beginning before time=lag were missed.
I also tried to use standard deviation of the gradient as a signal that a peak was beginning. I.e., if gradient of a given point is factor*stdev(all gradients) then a peak is beginning. *Factor was 0.6
This failed when certain signals had one very steep peak in the evening and a shallower one in the morning (or vice versa). The stdev of the gradient would be too high and the algorithm missed the shallower peak. If I made the factor low enough to pick up the shallow peak as well, false peaks were detected.
Inspired by the method in the post above, I tried using a moving stdev of the gradient. However, this algorithm still misses some peaks.