1

can I use the y-hat variance, the bounds, and the point estimate from the forecast data frame to calculate the confidence level that would contain a given value?

I've seen that I can change my interval level prior to fitting but programmatically that feels like a LOT of expensive trial and error. Is there a way to estimate the confidence bound using only the information from the model parameters and the forecast data frame?

Something like:

for level in [.05, .1, .15, ... , .95]:
  if value_in_question in (yhat - Z_{level}*yhat_variance/N, yhat + Z_{level}*yhat_variance/N):
        print 'im in the bound level {level}'
# This is sudo code not meant to run in console 

EDIT: working prophet example

# csv from fbprohets working examples https://github.com/facebook/prophet/blob/master/examples/example_wp_log_peyton_manning.csv

import pandas as pd
from fbprophet import Prophet
import os
df = pd.read_csv('example_wp_log_peyton_manning.csv')
m = Prophet()
m.fit(df)
future = m.make_future_dataframe(periods=30)
forecast = m.predict(future)

# the smallest confidence level s.t. the confidence interval of the 30th prediction contains 9

## My current approach
def __probability_calculation(estimate, forecast, j = 30):
    sd_residuals = (forecast.yhat_lower[j] - forecast.yhat[j])/(-1.28)
    for alpha in np.arange(.5, .95, .01):
        z_val = st.norm.ppf(alpha)
        if (forecast.yhat[j]-z_val*sd_residuals < estimate < forecast.yhat[j]+z_val*sd_residuals):
           return alpha

prob = __probability_calculation(9, forecast)
Travasaurus
  • 601
  • 1
  • 8
  • 26

1 Answers1

3

fbprophet uses the numpy.percentile method to estimate the percentiles as you can see here in the source code: https://github.com/facebook/prophet/blob/0616bfb5daa6888e9665bba1f95d9d67e91fed66/python/prophet/forecaster.py#L1448

How to inverse calculate percentiles for values is already answered here: Map each list value to its corresponding percentile

Combining everything based on your code example:

import pandas as pd
import numpy as np
import scipy.stats as st
from fbprophet import Prophet

url = 'https://raw.githubusercontent.com/facebook/prophet/master/examples/example_wp_log_peyton_manning.csv'
df = pd.read_csv(url)
   
# put the amount of uncertainty samples in a variable so we can use it later.
uncertainty_samples = 1000 # 1000 is the default
m = Prophet(uncertainty_samples=uncertainty_samples)
m.fit(df)
future = m.make_future_dataframe(periods=30)

# You need to replicate some of the preparation steps which are part of the predict() call internals
tmpdf = m.setup_dataframe(future)
tmpdf['trend'] = m.predict_trend(tmpdf)
sim_values = m.sample_posterior_predictive(tmpdf)

The sim_values object contains for every datapoint 1000 simulations on which the confidence interval is based.

Now you can call the scipy.stats.percentileofscore method with any target value

target_value = 8
st.percentileofscore(sim_values['yhat'], target_value, 'weak') / uncertainty_samples
# returns 44.26

To prove this works backwards and forwards you can get the output of the np.percentile method and put it in the scipy.stats.percentileofscore method. This works for an accuracy of 4 decimals:

ACCURACY = 4
for test_percentile in np.arange(0, 100, 0.5):
    target_value = np.percentile(sim_values['yhat'], test_percentile)
    if not np.round(st.percentileofscore(sim_values['yhat'], target_value, 'weak') / uncertainty_samples, ACCURACY) == np.round(test_percentile, ACCURACY):
        print(test_percentile)
        raise ValueError('This doesnt work')
BramV
  • 461
  • 3
  • 6
  • Thanks for looking into this! I added a working example, and have checked out the links you added. – Travasaurus Apr 27 '21 at 18:56
  • Out of curiosity do you think my proposed solution holds any weight? Or is it flawed? – Travasaurus May 01 '21 at 00:14
  • 1
    Depends quite a bit on what you do with it.. What kind of data are we talking about? What do you want to achieve? What will you do with the percentile that is calculated? – BramV May 02 '21 at 20:29
  • The data is a series of inter-arrival times between 'events', but for all intense and purposes, it can be treated as an IID predictor. I'm trying to produce a metric of 'trust'. Someone may take action based on the results of this model, and I want a metric that can give a hint of how serious the user needs to take this prediction. For example, if the model predicts (with high likelihood) that the values will fall below some key threshold next week, then they need to act immediately. Otherwise, they can take further steps to investigate at their leisure. Also, I want to note that this will... – Travasaurus May 03 '21 at 17:01
  • not be the only metric. The operator will have a number of metrics to make their decision. I may even try to fit a classification model based on these metrics so that they can have a single holistic metric of 'trust' based on the probability output of that classification model. – Travasaurus May 03 '21 at 17:03
  • 1
    Thank you for sharing the further explanation. As there is a fixed threshold value it may make sense indeed to check the probability of reaching that value. Choosing the probability threshold may still be a bit arbitrary though. Alternatively: you can calculate the lower bound at e.g. 90% and trigger an alert if the value reaches below a threshold. The result would be the same: if there is a realistic risk of a low value you will get alerted. So it's really more a preference on what kind of inputs to provide and how to interpret the output. – BramV May 05 '21 at 15:20