3

I am trying to get the best distributions for my data. The fitting is finished as shown in below figure, but i need a measurement, to choose the best model. I compared goodness of fit with a chi-squared value, and test for significant difference between observed and fitted distribution with a Kolmogorov-Smirnov (KS) test. I searched for some of potential solutions 1,2,3 but I didn't get my answer.From the results in below figure:

  1. If the p-value is higher than k-statistic, does it means we can accept the hypothesis or data fits the distribution well?

  2. Alternatively, is it ok to compare level of significance(a=0.005) with p-value and decide the acceptance or rejection of hypothsis ? If p-value is lower than a, then it is very probable that the two distributions are different.

  3. For Kolmogorov-Smirnov test, is it essential to standardised the data (-1,1) ?

  4. Judging from the KS statistic and P-values, the exponnorm fits best in the data. Is that correct?

enter image description here

I calculated the P-value in following way:

for distribution in dist_names:
    # Set up distribution and get fitted distribution parameters
    dist = getattr(scipy.stats, distribution)
    param = dist.fit(y_std)   
    p = scipy.stats.kstest(y_std, distribution, args=param)[1]
    p = np.around(p, 5)
    p_values.append(p) 
Case Msee
  • 405
  • 5
  • 17

1 Answers1

2
  1. No, you can either compare K-statistic to critical value in K-test critical value table or compare p-value to the level of significance, which is 0.005 in your case.
  2. Right, in statistics, if p-value is small, we reject the null and accept the alternative one.
  3. No, if we standardize the data before applying KS-test, we lose information about the distribution of raw data. For example, if data comes from a geometric distribution, after normalization, it is going to converge in distribution as normal (0,1) as the number of samples goes to infinity.
  4. Yes, because p-value> a in this case, we fail to reject our null and accept this the input data has the same distribution as exponnorm.
    By the way, this question should belong to Cross Validated since it is more or less related to statistical knowledge. Hope this answer helps you.
Newcomer
  • 73
  • 7
  • 1). For comparison between `K-statistic` to `critical value in K-test critical table`.If the `k-statistic value=0.0385` as shown in results figure, the K-test critical value table will be `D_crit=1.36/sqrt(n)=> 0.0057` ? where `n`= number of sample in data. Is that right? The `total samples=569` – Case Msee Sep 29 '20 at 04:19
  • 1
    @CaseMsee Exactly, since the sample size `n` is greater than 50, use `D_crit=1.36/sqrt(n)` instead of dividing the level of significance by n in the critical value table. – Newcomer Sep 29 '20 at 18:31
  • Thank you for great answer. Based on your feedback, I have some done analysis https://ibb.co/GnvT1x2. Can you please check? – Case Msee Sep 30 '20 at 03:07
  • In some of my datasets, I am getting `P_value=0` as shown https://ibb.co/XzN4qhK. I am not sure weather this is normal or not good as i am getting `K-static values`. – Case Msee Sep 30 '20 at 03:33