3

Is there a built in function that will provide the confidence intervals for parameter estimates in a python package or is this something I will need to implement by hand? I am looking for something similar to matlabs gevfit http://www.mathworks.com/help/stats/gevfit.html.

captain_M
  • 287
  • 2
  • 10

2 Answers2

2

The bootstrap can be used to estimate confidence intervals of any function (np.mean, st.genextreme.fit, etc.) of a sample, and there is a Python library: scikits.bootstrap.

Here for the data from the question author's related question:

import numpy as np, scipy.stats as st, scikits.bootstrap as boot
data = np.array([ 22.20379411,  22.99151292,  24.27032696,  24.82180626,
  25.23163221,  25.39987272,  25.54514567,  28.56710007,
  29.7575898 ,  30.15641696,  30.79168255,  30.88147532,
  31.0236419 ,  31.17380647,  31.61932755,  32.23452568,
  32.76262978,  33.39430032,  33.81080069,  33.90625861,
  33.99142006,  35.45748368,  37.0342621 ,  37.14768791,
  38.14350221,  42.72699534,  44.16449992,  48.77736737,
  49.80441736,  50.57488779])

st.genextreme.fit(data)   # just to check the parameters
boot.ci(data, st.genextreme.fit)

The results are

(-0.014387281261850815, 29.762126238637851, 5.8983127779873605)
array([[ -0.40002507,  26.93511496,   4.6677834 ],
       [  0.19743722,  32.41834882,   9.05026202]])

The bootstrap takes about three minutes on my machine; by default, boot.ci uses 10,000 bootstrap iterations (n_samples), see code or help(boot.ci), and st.genextreme.fit is not superfast.

The confidence intervals from boot.ci do not match the ones from MATLAB's gevfit exactly. E.g., MATLAB gives a symmetric one [-0.3032, 0.3320] for the first parameter (0.0144).

Community
  • 1
  • 1
Ulrich Stern
  • 10,761
  • 5
  • 55
  • 76
  • Thanks for this suggestion - however what if I want to specify a parameter for the distribution fitting, e.g. a location parameter, I can't get it to work, e.g. `boot.ci(data, genextreme.fit(data, loc=0))` - as it says a tuple object not callable. – dreab Nov 04 '16 at 10:28
  • @dreab, something like `boot.ci(data, lambda x: st.genextreme.fit(x, loc=29.7))` should work. – Ulrich Stern Nov 05 '16 at 09:21
1

Take a look at scipy and numpy in case you haven't already. If you have some familiarity with MATLAB, then the switch should be relatively easy. I've taken this quick snippet from this SO response:

import numpy as np
import scipy as sp
import scipy.stats

def mean_confidence_interval(data, confidence=0.95):
    a = 1.0*np.array(data)
    n = len(a)
    m, se = np.mean(a), scipy.stats.sem(a)
    h = se * sp.stats.t.ppf((1+confidence)/2., n-1)
    return m, m-h, m+h

You should be able to customize the returns to your liking. Like the MATLAB gevfit function, it defaults to using 95% confidence bounds.

Community
  • 1
  • 1
GJStein
  • 648
  • 5
  • 13