3

I'm using Facebook prophet for the anomaly detection task.

Optimization of the general hyper-parameters of a prophet will get us to make the predictions better(yhat), but anomalies in the prophet are decided/captured based on if the value(Y) lies outside the interval width.

Questions:

  1. For anomaly detection, interval_width parameter is extremely important, and I doubt how it's gonna help me predict the contextual anomalies or anomalies based on seasonality, trends and shifts?
  2. To optimize the parameter MCMC samples, should I use a maximum of posterior estimation or a full Bayesian interference with a specified number of Markov chain Monte Carlo samples to train and predict?

I'm also attaching a fragment of the graph plotted for detecting anomalies using prophet.
Any help and guidance in this direction will be extremely-extremely helpful, looking forward to a constructive and helpful discussion. Thanks. enter image description here

genz_on_code
  • 429
  • 4
  • 12

1 Answers1

1

I was using similar solution for my task in the past and Prophet worked fine. The assumption was to build model of my data with thresholds and capture everything outside the boundaries as the anomaly.

For the methodology, I think that it depends on what you want to capture, maybe in your case uncertainty in trend is enough (if this is what we see on the plot).

Answering your questions:

  1. Interval_width parameter is related to uncertainty in trend and observation noise only. As mentioned in docs: "we assume that the average frequency and magnitude of trend changes in the future will be the same as that which we observe in the history. We project these trend changes forward and by computing their distribution we obtain uncertainty intervals. One property of this way of measuring uncertainty is that allowing higher flexibility in the rate, by increasing changepoint_prior_scale, will increase the forecast uncertainty. (...) The width of the uncertainty intervals (by default 80%) can be set using the parameter interval_width." In short we can say that looking on the trend, 80% of samples should fit between those boundaries (yhat_upper and yhat_lower)

  2. To see the uncertainty in seasonality you must do full Bayesian sampling. Mcmc.samples is be default 0 meaning that it uses maximum of posterior estimation.

annabitel
  • 112
  • 1
  • 1
  • 11
  • hey, @annabednarska , thank you for an end-to-end concise answer, very helpful, also, I was exploring but didn't find any good resource on bayesian sampling for the above-mentioned case, can you please share some resources which you might be knowing about? – genz_on_code Feb 27 '22 at 17:25
  • 1
    hey @genz_on_code, regarding Bayesian sampling I was reading about MCMC method [here](https://machinelearningmastery.com/markov-chain-monte-carlo-for-probability/) - there are many additional links. Mcmc.samples parameter > 0 defines the number of MCMC samples used to train and test. Keep in mind that Bayesian inference will take long to run. – annabitel Feb 28 '22 at 16:32