16

I would like to find the Maximum Likelihood Estimator for some data that may be governed by a discrete distribution. But in scipy.stats only classes representing continuous distributions have a fit function to do that. What is the reason that the classes representing discrete distributions do not?

1 Answers1

11

Short answer: because nobody wrote the code for it, or even tried, as far as I know.

Longer answer: I don't know how far we can get with the discrete models with a generic maximum likelihood method as ther is for the continuous distributions, which works for many but not all of those.

Most discrete distributions have strong restrictions on the parameters, and most likely most of them will need a fit methods specific to the distribution

>>> [(f, getattr(stats, f).shapes) for f in dir(stats) if isinstance(getattr(stats, f), stats.distributions.rv_discrete)]
[('bernoulli', 'pr'), ('binom', 'n, pr'), ('boltzmann', 'lamda, N'), 
 ('dlaplace', 'a'), ('geom', 'pr'), ('hypergeom', 'M, n, N'), 
 ('logser', 'pr'), ('nbinom', 'n, pr'), ('planck', 'lamda'), 
 ('poisson', 'mu'), ('randint', 'min, max'), ('skellam', 'mu1,mu2'), 
 ('zipf', 'a')]

statsmodels is providing a few of the discrete models where the parameters can also depend on some explanatory variables. Most of those, like generalized linear models, need a link function to restrict the values for the parameters to the valid range, for example interval (0, 1) for probabilities, or larger than zero for parameters in count models.

Then "n" parameter in binomial and some of the other ones are required to be integers, which makes it impossible to use the usual continuous minimizers from scipy.optimize.

A good solution would be for someone to add distribution specific fit methods, so that we have at least the easier ones available.

Josef
  • 21,998
  • 3
  • 54
  • 67
  • I see. Thanks for a useful answer. To begin with my problem will be moved forward if I can reject or nor reject Zipf as a candidate distribution to govern some data so I might have to have a go at writing that myself. Interestingly, Mathematica does a good impression of being able to find MLE for discrete distributions. But I believe that M'matica functions tend to have a lot of special cases hard-coded into them. – Keith Braithwaite May 09 '13 at 22:00
  • Statsmodels has a generic maximum likelihood class that can be useful in some cases, see my answer here https://groups.google.com/d/msg/pystatsmodels/GZ8kXoFitn0/9ve8GVOwl1kJ MLE might work for Zipf (I never looked at it) http://stats.stackexchange.com/questions/6780/how-to-calculate-zipfs-law-coefficient-from-a-set-of-top-frequencies – Josef May 09 '13 at 23:50