6

I am writing a script to find the best-fitting distribution over a dataset using scipy.stats. I first have a list of distribution names, over which I iterate:

dists = ['alpha', 'anglit', 'arcsine', 'beta', 'betaprime', 'bradford', 'norm']
for d in dists:
    dist = getattr(scipy.stats, d)
    ps = dist.fit(selected_data)
    errors.loc[d,['D-Value','P-Value']] = kstest(selected.tolist(), d, args=ps)
    errors.loc[d,'Params'] = ps

Now, after this loop, I select the minimum D-Value in order to get the best fitting distribution. Now, each distribution returns a specific set of parameters in ps, each with their names and so on (for instance, for 'alpha' it would be alpha, whereas for 'norm' they would be mean and std).

Is there a way to get the names of the estimated parameters in scipy.stats?

Thank you in advance

user1695639
  • 71
  • 1
  • 4
  • imho, do it explicitly since you know which distribution you used... make a wrapper around your ps and handle them there with easy to understand method that will tell you what distribution was it and what params can you expect. – user3012759 May 26 '15 at 14:11
  • 3
    FWIW, Each distribution in scipy stats has an attribute `shapes`. – ev-br May 30 '15 at 19:07
  • Thanks for the replies; in the end I had to do it manually, since the `shapes` parameter still didn't give me a name that could be valid for a publication. – user1695639 Jun 09 '15 at 16:49

2 Answers2

3

Warren Weckesser and I have developed a more robust solution:

import sys
import scipy.stats

def list_parameters(distribution):
    """List parameters for scipy.stats.distribution.
    # Arguments
        distribution: a string or scipy.stats distribution object.
    # Returns
        A list of distribution parameter strings.
    """
    if isinstance(distribution, str):
        distribution = getattr(scipy.stats, distribution)
    if distribution.shapes:
        parameters = [name.strip() for name in distribution.shapes.split(',')]
    else:
        parameters = []
    if distribution.name in scipy.stats._discrete_distns._distn_names:
        parameters += ['loc']
    elif distribution.name in scipy.stats._continuous_distns._distn_names:
        parameters += ['loc', 'scale']
    else:
        sys.exit("Distribution name not found in discrete or continuous lists.")
    return parameters

The discussion can be found here.

Adam Erickson
  • 6,027
  • 2
  • 46
  • 33
2

This code demonstrates the information that ev-br gave in his answer in case anyone else lands here.

>>> from scipy import stats
>>> dists = ['alpha', 'anglit', 'arcsine', 'beta', 'betaprime', 'bradford', 'norm']
>>> for d in dists:
...     dist = getattr(scipy.stats, d)
...     dist.name, dist.shapes
... 
('alpha', 'a')
('anglit', None)
('arcsine', None)
('beta', 'a, b')
('betaprime', 'a, b')
('bradford', 'c')
('norm', None)

I would point out that the shapes parameter yields a value of None for distributions such as the normal which are parameterised by location and scale.

Community
  • 1
  • 1
Bill Bell
  • 21,021
  • 5
  • 43
  • 58