1

I recently read about the matrix profile method and I am interested in finding the matrix profile segmentation (FLOSS) in python.

From Mueen and Keogh's tutorial, the page I am referring at is 65.

Is there a package/method for finding FLOSS?

Emma
  • 27,428
  • 11
  • 44
  • 69
Alexia
  • 11
  • 4
  • Do you need the algorithm to be online or will a static evaluation with FLUSS (instead of FLOSS) work? – slaw Jun 20 '19 at 03:26
  • Thank you for the response! If I have understood correctly, FLUSS and FLOSS will give the same results (please correct me if I'm wrong). What is the difference between them? – Alexia Jun 20 '19 at 08:45
  • IIRC, FLUSS is offline (i.e., you have your data in hand already) whereas FLOSS is online (i.e., you constantly have new data streaming in and you are constantly updating your “corrected arc curve”). I recommend reading the original paper on this topic: https://www.researchgate.net/profile/Chin_Chia_Michael_Yeh/publication/321894569_Matrix_Profile_VIII_Domain_Agnostic_Online_Semantic_Segmentation_at_Superhuman_Performance_Levels/links/5a8d1511a6fdcc786eb01f61/Matrix-Profile-VIII-Domain-Agnostic-Online-Semantic-Segmentation-at-Superhuman-Performance-Levels.pdf?origin=publication_detail – slaw Jun 20 '19 at 11:02
  • Thanks for the clarification. To answer your question: I need the algorithm to be online (FLOSS). – Alexia Jun 20 '19 at 13:31
  • Is Python an option? – slaw Jun 26 '19 at 00:10
  • Yes. Python would be perfect. – Alexia Jun 26 '19 at 07:32
  • While it hasn't been implemented yet, I would keep an eye out for this: https://github.com/TDAmeritrade/stumpy/issues/44 Full disclosure, I am the core maintainer and developer of this package and I can tell you, as of July 11, 2019, this is about 90% complete and you should see the feature added in a few weeks once the unit tests are written. – slaw Jul 11 '19 at 15:25
  • Sorry for the delay. FLOSS was added in version 1.1.0 of STUMPY: https://stumpy.readthedocs.io/en/latest/Tutorial_3.html Please submit an issue on the Github repo if you have questions. – slaw Aug 05 '19 at 01:47
  • About FLOSS: if the values of the time series are not strictly every k minutes (for example) or there are some missing values is it a problem? Or should the time series first be resampled and then collect data every k minutes? I hope this makes sense. – Alexia Jan 22 '20 at 12:11
  • I think this question extends beyond FLOSS. Essentially, missing time series data is almost always a problem. It is best to have the data collected at every time point. If it is not possible then you need to decide how you'd want to impute those values (maybe a forward fill would work). Remember that, at the end of the day, a matrix profile is computed by comparing z-normalized Euclidean distances and it is technically impossible to compute this distance when data points are missing. – slaw Jan 22 '20 at 17:51
  • If you have more questions specifically around the use of FLOSS within STUMPY then I recommend filing a Github issue so that it can be tracked and referenced by others who may have similar questions: https://github.com/TDAmeritrade/stumpy/issues – slaw Jan 22 '20 at 17:54

1 Answers1

3

FLOSS and FLUSS have been implemented in the open source Python package, stumpy

You can see example usage of it here and the documentation can be found here.

Here is example usage of FLUSS:

import stumpy
import numpy as np

your_time_series = np.random.rand(10000)
window_size = 50  # Approximately, how many data points might be found in a pattern

matrix_profile = stumpy.stump(your_time_series, m=window_size)

subseq_len = 50
correct_arc_curve, regime_locations = stumpy.fluss(matrix_profile[:, 1],
                                                   L=subseq_len,
                                                   n_regimes=2,
                                                   excl_factor=1
                                                  )
slaw
  • 6,591
  • 16
  • 56
  • 109