3

I've started doing more and more work with the new Mathematica statistics and data analysis features.

I attended the "Statistics & Data Analysis with Mathematica" online seminar on Tuesday (great presentation, I highly recommend it) but I've run into some problems that I hope someone on this forum might have a few moments to consider.

I've created a rather extensive notebook to streamline my data analysis, call it "AnalysisNotebook". It outputs an extensive series of charts and data including: histograms, PDF and CDF plots, Q-Q plots, plots to study tail fit, hypothesis test data, etc.

This works great as long as I stay with Mathematica's off-the-shelf distributions and probably works fine for simple MixtureDistributions and even ParameterMixtureDistributions as for these Mathematica can likely figure out the moments and PDF and CDF, FindDistributionParameters, etc by breaking the mixtures down into pieces.

I run into trouble when I attempt to define and use even a simple TransformedDistribution i.e.,

LogNormalNormalDistribution[Gamma_, Sigma_, Delta_] := 
  TransformedDistribution[ u*v + Delta, 
   {Distributed[ u, LogNormalDistribution[ Log[Gamma], Sigma] ],  
    Distributed[ v, NormalDistribution[0, Sqrt[2]]}
   ];

I'd like to do a lot of things along the lines of such Transformed Distributions. I appreciate the challenge something like this presents (some of which I've learned on this forum - thank you all):

  • They may not have closed forms;
  • PDF and CDF calculation may need interpolation, work-arounds, or custom approaches;
  • FindDistributionParameters and DistributionFitTest won't know how to deal with this kind of thing.

Basically the standard things one would want to use really don't/can't work and one can't fairly expect them to do so.

One can write custom code to do these sorts of things (again this forum has helped me a lot), but then incorporating all of the complexity of custom alternatives into my AnalysisNotebook, just seems silly. The AnalysisNotebook would grow with each new custom function.

It would help me immensely in this effort if I could write my custom versions of PDF, CDF, FindDistributionParameters, DistributionFitTest and anything else I might need to standards that the more general built in versions would simply call seamlessly. This way, something like my AnalysisNotebook could remain simple and uncluttered, a standard component in my tool box. I could spend my time working on the math rather than plumbing, if you take my meaning.

To clarify what I mean by this, similar to how one can define versions of a function to do different things (use different numbers of arguments or other kinds of situational awareness), Mathematica must do something similar for the functions that use distributions as arguments to know which solution to use for a particular built-in distribution. I want the ability to add or extend the functionality of PDF[], CDF[], FindDistributionParameters[], DistributionFitTest[] and related functions at that level -- to add functionality for custom distributions and their required supporting code, which the built in functions would/could call seamlessly.

Perhaps just a dream, but if anyone knows of any way I could approach this, I'd very much appreciate your feedback.

EDIT- The kind of problems I've encountered:

The following code never completes execution

r1 = RandomVariate[LogNormalNormalDistribution[0.01, 0.4, 0.0003], 1000];
FindDistributionParameters[r1, LogNormalNormalDistribution[gamma, sigma, delta]]

To work around this I wrote the following function

myLNNFit[data_] := Module[{costFunction, moments}, 
    moments = Moment[EmpiricalDistribution[data], #] & /@ Range[5]; 
    costFunction[gamma_, sigma_, delta_] = 
    Sqrt@Total[((Moment[LogNormalNormalDistribution[gamma, sigma, delta],#]&/@Range[5]) - moments)^2]; 
    NMinimize[{costFunction[gamma, sigma, delta], gamma > 0, sigma > 0}, {gamma, sigma, delta}] ]

This works fine by itself, but doesn't play well with everything else.

Jagra
  • 3,149
  • 1
  • 18
  • 19
  • what sort of problems do you run into when you use your `TransformedDistribution`? (I found your previous questions, maybe reference them with a brief summary ...) Also, any special symbol used in a notebook, like `\[Gamma]`, shows up as I wrote it, so to make it easier to read you may wish to remove the markup from any copied code just before you hit post. – rcollyer Apr 07 '11 at 15:37
  • Thanks for cleaning up the code. As for the kind of problems I've encountered: While creating random variates works fine: r1 = RandomVariate[LogNormalNormalDistribution[0.01, 0.4, 0.0003], 1000]; ListLinePlot[%, PlotRange -> All] When one then attempts to FindDistributionParameters[r1, LogNormalNormalDistribution[gamma, sigma, delta]] the code never completes execution. – Jagra Apr 07 '11 at 15:44
  • Continuing... to do this I wrote (with a bit of help) myLNNFit[data_] := Module[{costFunction, moments}, moments = Moment[EmpiricalDistribution[data], #] & /@ Range[5]; costFunction[gamma_, sigma_, delta_] = Sqrt@Total[((Moment[ LogNormalNormalDistribution[gamma, sigma, delta], #] & /@ Range[5]) - moments)^2]; NMinimize[{costFunction[gamma, sigma, delta], gamma > 0, sigma > 0}, {gamma, sigma, delta}] ] This works fine by itself, but doesn't play well with everything else. – Jagra Apr 07 '11 at 15:49
  • why don't you update your question with the info? It formats better. – rcollyer Apr 07 '11 at 18:39

2 Answers2

8

You can use TagSet to specify the symbol to which you want to associate a definition. This lets you define the PDF of a distribution even though PDF is Protected. Here's a trivial example. Note that TriangleWave is a built-in symbol, and TriangleDistribution is something I just made up. This fails:

PDF[TriangleDistribution[x_]] := TriangleWave[x]

This works:

TriangleDistribution /: PDF[TriangleDistribution[x_]] := TriangleWave[x]

Now you can do:

Plot[PDF[TriangleDistribution[x]], {x, 0, 1}]
joebolte
  • 406
  • 2
  • 5
  • This looks very interesting and useful and may give me all that I need. I'll get back to recognizing this as an answer, as soon as I've had a chance to try it. Thanks! – Jagra Apr 07 '11 at 15:57
3

Dear Jarga, the following tutorial in Mathematica documentation describes now you would enable random number generation for your distribution, look near the bottom of this document for a section 'Defining Distributional Generators'.

It is quite similar to what Joe suggested. You would need to define

In[1]:= Random`DistributionVector[
  LogNormalNormalDistribution[gamma_, sigma_, delta_], len_, prec_] ^:=
  RandomVariate[LogNormalDistribution[Log[gamma], sigma], len, 
    WorkingPrecision -> prec]*
   RandomVariate[NormalDistribution[0, Sqrt[2]], len, 
    WorkingPrecision -> prec] + delta

In[2]:= RandomVariate[
 LogNormalNormalDistribution[0.01, 0.4, 0.0003], 5]

Out[2]= {-0.0013684, 0.00400979, 0.00960139, 0.00524952, 0.012049}

I am not aware of any documented way to insert a new distribution into the estimation framework. The hypothesis testing should work if CDF is defined for your distribution and works correctly.

Sasha
  • 5,935
  • 1
  • 25
  • 33