comparing generated data to measured data

Question

we have measured data that we managed to determine the distribution type that it follows (Gamma) and its parameters (A,B)

And we generated n samples (10000) from the same distribution with the same parameters and in the same range (between 18.5 and 59) using for loop

for i=1:1:10000
tot=makedist('Gamma','A',11.8919,'B',2.9927);
tot= truncate(tot,18.5,59);
W(i,:) =random(tot,1,1);
end

Then we tried to fit the generated data using:

h1=histfit(W);

After this we tried to plot the Gamma curve to compare the two curves on the same figure uing:

hold on
h2=histfit(W,[],'Gamma');
h2(1).Visible='off';

The problem s the two curves are shifted as in the following figure "Figure 1 is the generated data from the previous code and Figure 2 is without truncating the generated data"

enter image description here

Any one knows why??

Thanks in advance

@ pjs we tried the same code without truncating and the two curves were shifted also — Eman Nabil, Jul 15 '17 at 17:06

Leander Moesinger · Accepted Answer · 2017-07-15T17:55:25.790

0

By default histfit fits a normal probability density function (PDF) on the histogram. I'm not sure what you were actually trying to do, but what you did is:

% fit a normal PDF
h1=histfit(W); % this is equal to h1 = histfit(W,[],'normal');

% fit a gamma PDF
h2=histfit(W,[],'Gamma');

Obviously that will result in different fits because a normal PDF != a gamma PDF. The only thing you see is that for the gamma PDF fits the curve better because you sampled the data from that distribution.

If you want to check whether the data follows a certain distribution you can also use a KS-test. In your case

% check if the data follows the distribution speccified in tot
[h p] = kstest(W,'CDF',tot)

If the data follows a gamma dist. then h = 0 and p > 0.05, else h = 1 and p < 0.05.

Now some general comments on your code: Please look up preallocation of memory, it will speed up loops greatly. E.g.

W = zeros(10000,1);
for i=1:1:10000
    tot=makedist('Gamma','A',11.8919,'B',2.9927);
    tot= truncate(tot,18.5,59);
    W(i,:) =random(tot,1,1);
end

Also,

tot=makedist('Gamma','A',11.8919,'B',2.9927);
tot= truncate(tot,18.5,59);

is not depending in the loop index and can therefore be moved in front of the loop to speed things up further. It is also good practice to avoid using i as loop variable.

But you can actually skip the whole loop because random() allows to return multiple samples at once:

tot=makedist('Gamma','A',11.8919,'B',2.9927);
tot= truncate(tot,18.5,59);
W =random(tot,10000,1);

edited Jul 15 '17 at 17:55

answered Jul 15 '17 at 17:07

Leander Moesinger

2,449
15
28

first of all thanks for your useful comments they will speed up the program. my question is how to compare the generated data from a specific distribution to assure that it fits the distribution type it was generated from?? – Eman Nabil Jul 15 '17 at 17:21
Why do you want to check that? If you sample directly from a specific distribution it is already given that it follows that distribution. – Leander Moesinger Jul 15 '17 at 17:37
But if you want to check whether some data follows a distribution, you can a) visually check if the `histfit` curve fits nicely or b) use a KS test (a bit more complicated) https://ch.mathworks.com/help/stats/kstest.html – Leander Moesinger Jul 15 '17 at 17:39
we have used the exchange file "allfitdist.m" written by Mike Sheppard it gave that the best curve that fits our measured data was Gamma, and when we used it again on the generated data it gave another distribution with different parameters, Why is that?? – Eman Nabil Jul 15 '17 at 17:58
the measured data was found (Gamma distribution with parameters A = 11.8919 and B= 2.9927 ) the generated data using the same allfitdist.m showed that the first distribution that fits the data was (generalized extreme value with parameters k=-0.1659 sigma=8.1304 mu=31.7997) and the second distribution was (Gamma with parameters A=15.7, B=2.25) – Eman Nabil Jul 15 '17 at 18:40
1

The thing is that many distributions are similar to each other given the right parameters. If you plot both the PDF of a GEV and gamma dist. with the given parameters you will see that the shapes will look almost identical. Therefore even slight changes in the data can swap the best matching dist.. This means that *both* dists. fit the data well and you can pick any of the two distributions - it won't make a noticeable difference. But i'd take the gamma dist. because it has less parameters (Occam's razor) – Leander Moesinger Jul 15 '17 at 18:55

comparing generated data to measured data

1 Answers1