I've started doing more and more work with the new Mathematica statistics and data analysis features.
I attended the "Statistics & Data Analysis with Mathematica" online seminar on Tuesday (great presentation, I highly recommend it) but I've run into some problems that I hope someone on this forum might have a few moments to consider.
I've created a rather extensive notebook to streamline my data analysis, call it "AnalysisNotebook". It outputs an extensive series of charts and data including: histograms, PDF
and CDF
plots, Q-Q plots, plots to study tail fit, hypothesis test data, etc.
This works great as long as I stay with Mathematica's off-the-shelf distributions and probably works fine for simple MixtureDistribution
s and even ParameterMixtureDistribution
s as for these Mathematica can likely figure out the moments and PDF
and CDF
, FindDistributionParameters
, etc by breaking the mixtures down into pieces.
I run into trouble when I attempt to define and use even a simple TransformedDistribution
i.e.,
LogNormalNormalDistribution[Gamma_, Sigma_, Delta_] :=
TransformedDistribution[ u*v + Delta,
{Distributed[ u, LogNormalDistribution[ Log[Gamma], Sigma] ],
Distributed[ v, NormalDistribution[0, Sqrt[2]]}
];
I'd like to do a lot of things along the lines of such Transformed Distributions. I appreciate the challenge something like this presents (some of which I've learned on this forum - thank you all):
- They may not have closed forms;
PDF
andCDF
calculation may need interpolation, work-arounds, or custom approaches;FindDistributionParameters
andDistributionFitTest
won't know how to deal with this kind of thing.
Basically the standard things one would want to use really don't/can't work and one can't fairly expect them to do so.
One can write custom code to do these sorts of things (again this forum has helped me a lot), but then incorporating all of the complexity of custom alternatives into my AnalysisNotebook, just seems silly. The AnalysisNotebook would grow with each new custom function.
It would help me immensely in this effort if I could write my custom versions of PDF
, CDF
, FindDistributionParameters
, DistributionFitTest
and anything else I might need to standards that the more general built in versions would simply call seamlessly. This way, something like my AnalysisNotebook could remain simple and uncluttered, a standard component in my tool box. I could spend my time working on the math rather than plumbing, if you take my meaning.
To clarify what I mean by this, similar to how one can define versions of a function to do different things (use different numbers of arguments or other kinds of situational awareness), Mathematica must do something similar for the functions that use distributions as arguments to know which solution to use for a particular built-in distribution. I want the ability to add or extend the functionality of PDF[]
, CDF[]
, FindDistributionParameters[]
, DistributionFitTest[]
and related functions at that level -- to add functionality for custom distributions and their required supporting code, which the built in functions would/could call seamlessly.
Perhaps just a dream, but if anyone knows of any way I could approach this, I'd very much appreciate your feedback.
EDIT- The kind of problems I've encountered:
The following code never completes execution
r1 = RandomVariate[LogNormalNormalDistribution[0.01, 0.4, 0.0003], 1000];
FindDistributionParameters[r1, LogNormalNormalDistribution[gamma, sigma, delta]]
To work around this I wrote the following function
myLNNFit[data_] := Module[{costFunction, moments},
moments = Moment[EmpiricalDistribution[data], #] & /@ Range[5];
costFunction[gamma_, sigma_, delta_] =
Sqrt@Total[((Moment[LogNormalNormalDistribution[gamma, sigma, delta],#]&/@Range[5]) - moments)^2];
NMinimize[{costFunction[gamma, sigma, delta], gamma > 0, sigma > 0}, {gamma, sigma, delta}] ]
This works fine by itself, but doesn't play well with everything else.