0

Given some small window, I'm trying to find the most similar window within a long sequence. I initially used SciPy correlation filter, which was pretty fast (less than a second for windows of length 10k and a sequence of length 600k) but did not actually land on the most similar windows.

Now I'm using a for-loop to find the window with the least MSE, but the code is painfully slow!

min_mse = np.inf
min_mse_idx = None
for i in range(len(training_data) - self.window_size):
  real_window = training_data[i:i+self.window_size]
  mse = np.mean(np.square(window - real_window))
  if mse < min_mse:
    min_mse = mse
    min_mse_idx = i

Does NumPy, SciPy, or any other Python library provide a more efficient way of solving this problem? The sequence is a NumPy array of shape (600000, 16) and the windows are usually (10000, 16).

  • Seems very relevant - [`Compute mean squared, absolute deviation and custom similarity measure - Python/NumPy`](https://stackoverflow.com/questions/41330517/compute-mean-squared-absolute-deviation-and-custom-similarity-measure-python). – Divakar Dec 22 '19 at 10:23
  • The solutions offered in that question work best for images and not long sequences. – Mohammed Farahmand Dec 22 '19 at 19:44
  • The only difference I see is that you won't be traversing along the width. So, the math stays the same, just the number of axes would be one less. – Divakar Dec 22 '19 at 19:59

0 Answers0