2

I'm trying to compute the Goodness of Fit of a bi-modal Gaussian distribution. To do this, Mathematica seems to require a symbolic distribution function to which to compare. Because such a bi-modal distribution is not a stock distribution, I'm trying to define one. The obvious use of

MixtureDistribution[{fs,(1-fs),{NormalDistribution[\[mu]S,\[sigma]S],NormalDistribution[\[mu]L,\[sigma]L]}]

generates a distribution that can be plotted, but the analysis used by DistributionFitTest[] fails.

This topic has been addressed in previous questions in discussions between @Sasha and @Jagra:

DistributionFitTest[] for custom distributions in Mathematica

Minimizing NExpectation for a custom distribution in Mathematica

but I was unable to find a resolution that enabled the use of

DistributionFitTest[data,dist,"HypothesisTestData"]

when dist is not a built-in distribution type.

Because the distribution I'm modeling is composed of simple pieces, describing the properties of the distribution is not too difficult, and I have attempted to describe as many features as I know in order to create a well defined distribution that Mathematica 8 would recognize as one of its own. My attempt to define every parameter I can think of follows:

modelDist /: 
PDF[modelDist[fS_, \[Mu]S_, \[Sigma]S_, \[Mu]L_, \[Sigma]L_], x_] :=
PDF[MixtureDistribution[{fS, 1 - fS}, {NormalDistribution[\[Mu]S, \[Sigma]S], NormalDistribution[\[Mu]L, \[Sigma]L]}], x];

modelDist /: 
CDF[modelDist[fS_, \[Mu]S_, \[Sigma]S_, \[Mu]L_, \[Sigma]L_], x_] :=
CDF[MixtureDistribution[{fS, 1 - fS}, {NormalDistribution[\[Mu]S, \[Sigma]S], NormalDistribution[\[Mu]L, \[Sigma]L]}], x];

modelDist /: 
DistributionDomain[modelDist[fS_, \[Mu]S_, \[Sigma]S_, \[Mu]L_, \[Sigma]L_]] := 
Interval[{-Infinity, Infinity}];

modelDist /: 
Random`DistributionVector[modelDist[fS_, \[Mu]S_, \[Sigma]S_, \[Mu]L_, \[Sigma]L_], n_, prec_] := 
RandomVariate[MixtureDistribution[{fS, 1 - fS}, {NormalDistribution[\[Mu]S, \[Sigma]S], NormalDistribution[\[Mu]L, \[Sigma]L]}], n, WorkingPrecision -> prec];

modelDist /: 
DistributionParameterQ[modelDist[fS_, \[Mu]S_, \[Sigma]S_, \[Mu]L_, \[Sigma]L_]] := 
!TrueQ[Not[Element[{fS, \[Mu]S, \[Sigma]S, \[Mu]L, \[Sigma]L}, Reals] && fS > 0 && fS < 1 && \[Sigma]S > 0 && \[Sigma]L > 0]];

modelDist /: 
DistributionParameterAssumptions[modelDist[fS_, \[Mu]S_, \[Sigma]S_, \[Mu]L_, \[Sigma]L_]] := 
Element[{fS, \[Mu]S, \[Sigma]S, \[Mu]L, \[Sigma]L}, Reals] && fS > 0 && fS < 1 && \[Sigma]S > 0 && \[Sigma]L > 0;

modelDist /: 
MomentGeneratingFunction[modelDist[fS_, \[Mu]S_, \[Sigma]S_, \[Mu]L_, \[Sigma]L_], t_] := 
fS E^(t \[Mu]S + (t^2 \[Sigma]S^2)/2) + (1 - fS) E^(t \[Mu]L + (t^2 \[Sigma]L^2)/2);

modelDist /: 
CharacteristicFunction[modelDist[fS_, \[Mu]S_, \[Sigma]S_, \[Mu]L_, \[Sigma]L_], t_] := 
fS E^(I t \[Mu]S + (t^2 \[Sigma]S^2)/2) + (1 - fS) E^(I t \[Mu]L + (t^2 \[Sigma]L^2)/2)

modelDist /: 
Moment[modelDist[fS_, \[Mu]S_, \[Sigma]S_, \[Mu]L_, \[Sigma]L_], n_] := 
Piecewise[{{fS*\[Sigma]S^n*(-1 + n)!!*Hypergeometric1F1[-(n/2), 1/2, -(\[Mu]S^2/(2*\[Sigma]S^2))] + (1 - fS) * \[Sigma]L^n*(-1 + n)!! * Hypergeometric1F1[-(n/2), 1/2, -(\[Mu]L^2/(2*\[Sigma]L^2))], Mod[n, 2] == 0}}, \[Mu]S*\[Sigma]S^(-1 + n)*n!!* Hypergeometric1F1[(1 - n)/2, 3/2, -(\[Mu]S^2/(2*\[Sigma]S^2))] + (1 - fS) * \[Mu]L*\[Sigma]L^(-1 + n)*n!! * Hypergeometric1F1[(1 - n)/2, 3/2, -(\[Mu]L^2/(2*\[Sigma]L^2))]];

modelDist /: 
Mean[modelDist[fS_, \[Mu]S_, \[Sigma]S_, \[Mu]L_, \[Sigma]L_]] := 
fS \[Mu]S + (1 - fS) \[Mu]L

modelDist /: 
Expectation[expr_, x_ \[Distributed] modelDist[fS_, \[Mu]S_, \[Sigma]S_, \[Mu]L_, \[Sigma]L_]] := 
fS*Expectation[expr, x \[Distributed] NormalDistribution[\[Mu]S, \[Sigma]S]] + (1 - fS)*Expectation[expr, x \[Distributed] NormalDistribution[\[Mu]L, \[Sigma]L]]

Everything seems to work up through the definition of Expectation, which throws

TagSetDelayed::tagpos: Tag modelDist in Expectation[expr_,x_\[Distributed]modelDist[fS_,\[Mu]S_,\[Sigma]S_,\[Mu]L_,\[Sigma]L_]] is too deep for an assigned rule to be found.

I don't know that having a definition for the expectation will magically make everything work, but it's the next step to to try, as having the Expectation allows computation of the Variance, and for all I know, that is the last tag that I need to define. Is there a syntax that will properly define this Expectation[] and pass the expression straight from my modelDist[] to its constituent NormalDistribution[]s?

(And if this entirely the wrong way to go about this, some advice to that effect would be appreciated.)

Community
  • 1
  • 1
KDN
  • 1,349
  • 2
  • 14
  • 17
  • Do you have values for fs, mu, sigma, L, and S? When I pick them, I get results using just the `MixtureDistribution` that you specified. When I test `RandomVariate` of the data with increasing size, I'd expect to get p values that approach 1. My values jump around randomly. They do that even if I use `NormalDistribution` instead of yours though. What do you mean when you say the analysis fails? – 0xFE Dec 29 '12 at 05:11
  • If I run DistributionFitTest[] on a model set of data with the distribution I specified, it never finishes, it just runs and runs and runs... If I try to fit the same data with an ordinary gaussian (not bi-modal), then it works just fine. Many other distribution-related functions do work, but DistributionFitTest doesn't... since the method is rather opaque, I'm not sure what aspect(s) of my custom distribution are missing... – KDN Dec 29 '12 at 13:19

0 Answers0