1

I have a dataset which I need to fit to a GEV distribution. The data is one dimensional, and is stored in a numpy array. Currently, I am using scipy.stats.genextreme.fit(data), which works ok, but gives totally inaccurate results (obvious by plotting the pdf). After some investigation it turns out that my data does not fit well in log space, which scipy uses in its MLE fitting algorithm, so I need to try something like GMM instead which is only available in statsmodels. The problem is that I can't find anything which looks like scipy's fit function. All the examples I've found seem to deal with far more complicated data than I have. Also, statsmodels requires endog and exog parameters for eveything, and I have no idea what these are.

This should be really simple, so I'm sure I'm missing something obvious. Has anyone used statsmodels in this way, and if so, any pointers as to how to do it?

aquavitae
  • 17,414
  • 11
  • 63
  • 106
  • http://statsmodels.sourceforge.net/devel/endog_exog.html – Fred Foo Mar 19 '14 at 16:27
  • 1
    Might be helpful if you post a set of data, I don't think it is an issue of MLE method (which `statsmodel` probably also uses). Maybe all what you need is, instead of GEV, another GEV related distribution, Gumbel, Gompertz, Weibull etc. – CT Zhu Mar 19 '14 at 16:41
  • @CTZhu, see [http://stackoverflow.com/questions/22167975/weird-pdfs-from-fitted-data](this) question which I previously posted. I've been told by a long-time expert on the subject matter that this data is not well represented in log space, hence my exploration into alternatives. I've tried the other distributions but they don't give good results either. – aquavitae Mar 19 '14 at 16:47
  • @larsmans I read the docs, I just don't understand them. – aquavitae Mar 19 '14 at 16:48
  • 4
    `exog` is `x`, `endog` is `y`. (Those crazy econometricians and their ten dollar words... :) – Warren Weckesser Mar 19 '14 at 16:52
  • @WarrenWeckesser Yes, but in 1D data, where is `x` and `y`? – aquavitae Mar 19 '14 at 16:56
  • [this answer](http://stackoverflow.com/a/16651955/832621) gives some examples about how to try many Statistic models using SciPy – Saullo G. P. Castro Mar 19 '14 at 18:01
  • You could try different starting values or try to fix some parameters in the MLE fit with scipy.stats. In some cases I got still good results when I tried. – Josef Mar 19 '14 at 19:08
  • In this case there is no x, or depending on the model it would be just ones. Using MLE in statsmodels wouldn't really be any different from using the MLE fit in scipy.stats, it would have the same problems. Generalized Method of Moments or Minimum Distance Estimation would work, but we would still need to specify the moment or distance conditions, and the current statsmodels GMM setup is designed for x, y cases, i.e. moment conditions for each observation. – Josef Mar 19 '14 at 19:13
  • @user333700 Is there anything in the Empirical Likelihood code for moment conditions that aren't for each observation? I had some EL and Generalized Maximum Entropy code for this at some point. – jseabold Mar 24 '14 at 17:44

1 Answers1

0

I'm guessing you want Gaussian Mixture Model (GMM) and not Generalized Method of Moments (GMM). The former GMM is available in scikit-learn here. The latter has code in statsmodels, but it's a work in progress.

EDIT Actually it's not clear to me that you want GMM. Maybe you just want a kernel density estimator (KDE). This is available in statsmodels here with an example

Hmm, if you do want to use (Generalized) Method of Moments to fit some kind of probability weighted GEV, then you need to specify the moment conditions, but I don't have a ready example for (G)MM in statsmodels for how you specify the moment conditions. You might be better off asking on the mailing list.

jseabold
  • 7,903
  • 2
  • 39
  • 53