3

I am checking the normality of the distribution of my data. Here. I am running the anderson test on that and the output is infinity. How can I interpret the results and how can I transform this type of distribution to normal distribution?

check_list= ["norm","logistic"]
for typelike in check_list:
    print typelike
    x=mydata
    print sp.stats.anderson(x, dist=typelike)

I am getting the following output

norm

AndersonResult(statistic=inf,
critical_values=array([ 0.576,  0.656, 0.787,  0.918,  1.092]),
significance_level=array([ 15. ,  10. ,   5. ,   2.5,   1. ]))

logistic

AndersonResult(statistic=2504915.1041950081,
critical_values=array([ 0.426,  0.563,  0.66 ,  0.769,  0.906,  1.01 ]),
significance_level=array([ 25. ,  10. ,   5. ,   2.5,   1. ,   0.5]))
Bonifacio2
  • 3,405
  • 6
  • 34
  • 54
  • would this help you? http://stackoverflow.com/questions/21030391/how-to-normalize-array-numpy – renno Oct 06 '16 at 18:11

1 Answers1

2

In the past, I experienced similar problems. Unfortunately, the numerical precision of SciPy's internal floating-point calculations is limited. This yields an exact 1 (or 0) for some points in the CDF, if you are several sigma's away from the center of the normal distribution. Then, the logarithms in the mathematical formulation of the Anderson-Darling test yield infinite values.

A possible solution, if you only have some specific distributions you are interested in, is to used the closed-form or high-precision CDF representations together with a high-precision math library (e.g. mpmath) for Python. Especially for the normal distribution, using the error functions (erf/erfc), this should provide the exact values. A custom implementation of the Anderson-Darling test then consists of only a few lines of code.

As an alternative, if possible, the Cramér-von-Mises-Test, which does not incorporate the logarithm.

vls
  • 243
  • 1
  • 3
  • 11