0

I have the following generated from the iris dataset from scipy.stats using the code

import scipy.stats as st
def get_best_distribution(data):
    dist_names = ["norm", "exponweib", "weibull_max", "weibull_min", "pareto", "genextreme"]
    dist_results = []
    params = {}
    for dist_name in dist_names:
        dist = getattr(st, dist_name)
        param = dist.fit(data)

        params[dist_name] = param
        # Applying the Kolmogorov-Smirnov test
        D, p = st.kstest(data, dist_name, args=param)
        print("p value for "+dist_name+" = "+str(p))
        dist_results.append((dist_name, p))

    # select the best fitted distribution
    best_dist, best_p = (max(dist_results, key=lambda item: item[1]))
    # store the name of the best fit and its p value

    print("Best fitting distribution: "+str(best_dist))
    print("Best p value: "+ str(best_p))
    print("Parameters for the best fit: "+ str(params[best_dist]))

    return best_dist, best_p, params[best_dist]

obtained from How to find probability distribution and parameters for real data? (Python 3):

Best fitting distribution: invgauss
Best p value: 0.8268700800511397
Parameters for the best fit: (0.016421213754032188, 1.5064355144322001, 309.4166651914064)

best_result = {"virginica": {"distribution": "invgauss", "parameters": [0.016421213754032188, 1.5064355144322001, 309.4166651914064]}}

I will now like to obtain the mean and standard deviation (resp. variance) from best_result. Looked up something similar at Distribution mean and standard deviation using scipy.stats but am unable to figure out how I can do that with SciPy..

Some insights will be deeply appreciated!

eyllanesc
  • 235,170
  • 19
  • 170
  • 241
Stoner
  • 846
  • 1
  • 10
  • 30

1 Answers1

1

Instead of saving the name of the distribution, save the distribution object. To do that, change

        dist_results.append((dist_name, p))

to

        dist_results.append((dist, p))

Then change the three print statements and the return statement in the function to

    print("Best fitting distribution:", best_dist.name)
    print("Best p value: "+ str(best_p))
    print("Parameters for the best fit:", params[best_dist.name])

    return best_dist, best_p, params[best_dist.name]

Then you can do this:

dist, p, par = get_best_distribution(data)

print("mean:", dist.mean(*par))
print("std: ", dist.std(*par))
Warren Weckesser
  • 110,654
  • 19
  • 194
  • 214
  • Thanks for the valuable advice/suggestion! The calling of the distribution object directly totally slipped my mind :p – Stoner Jul 16 '19 at 02:31
  • I'm able to obtain the mean and standard deviation as you have suggested as well. – Stoner Jul 16 '19 at 02:32