2

I am brand new to Java, second day! I want generate samples with normal distribution. I am using inverse transformation.

Basically, I want to find the inverse normal cumulative distribution, then find its inverse. And generate samples.

My questions is: Is there a built-in function for inverse normal cdf? Or do I have to hand code?

I have seen people refer to this on apache commons. Is this a built-in? Or do I have to download it?

If I have to do it myself, can you give me some tips? If I download, doesn't my prof also have to have the "package" or special file installed?

Thanks in advance!

Edit:Just found I can't use libraries, also heard there is simpler way converting normal using radian.

AHungerArtist
  • 9,332
  • 17
  • 73
  • 109
user1061210
  • 181
  • 1
  • 2
  • 9
  • As a rule of thumb, packages other than `java.*` or `javax.*` are not included in the Java runtime. – Viruzzo Feb 11 '12 at 18:47
  • got it, we need write our code to simulate samples from normal distribution with mean=10, variance=2. So, any download or packages would be bad I guess? – user1061210 Feb 11 '12 at 18:53
  • @AdelBoutros: The answers below are good, but the correct answer to use inverse transformation for normal cdf is first draw from uniform(0,1), then use the box-muller transformation formula, which are functions of sine and cosine. No package or coding needed. – user1061210 Aug 21 '12 at 22:11

4 Answers4

2

I had had the same problem and find its solution, the following code will give results for cumulative distribution function just like excel do:

 private static double erf(double x)
{
    //A&S formula 7.1.26
    double a1 = 0.254829592;
    double a2 = -0.284496736;
    double a3 = 1.421413741;
    double a4 = -1.453152027;
    double a5 = 1.061405429;
    double p = 0.3275911;
    x = Math.abs(x);
    double t = 1 / (1 + p * x);
    //Direct calculation using formula 7.1.26 is absolutely correct
    //But calculation of nth order polynomial takes O(n^2) operations
    //return 1 - (a1 * t + a2 * t * t + a3 * t * t * t + a4 * t * t * t * t + a5 * t * t * t * t * t) * Math.Exp(-1 * x * x);

    //Horner's method, takes O(n) operations for nth order polynomial
    return 1 - ((((((a5 * t + a4) * t) + a3) * t + a2) * t) + a1) * t * Math.exp(-1 * x * x);
}
public static double NORMSDIST(double z)
{
    double sign = 1;
    if (z < 0) sign = -1;

    double result=0.5 * (1.0 + sign * erf(Math.abs(z)/Math.sqrt(2)));
    return result;
}
2

As it is mentioned here:

Apache Commons - Math has what you are looking for.

More specifically, check out the NormalDistrubitionImpl class.

And no your professor doesn't need to download stuff if you provide him with all the needed libraries.

UPDATE :

If you want to hand code it (I don't know the actual formula), you can check the following link: http://home.online.no/~pjacklam/notes/invnorm/

There are 2 people who implemented it in java: http://home.online.no/~pjacklam/notes/invnorm/#Java

Community
  • 1
  • 1
Adel Boutros
  • 10,205
  • 7
  • 55
  • 89
  • So, I still have to email him my libraries? Is it possible to submit just one file for my hw? – user1061210 Feb 11 '12 at 18:51
  • Yes, if you build a JAR file and setup the manifest. You can include the referenced libraries. Unless, of course, he really wants you to write the methods yourself as part of the assigment – fishtoprecords Feb 11 '12 at 19:15
  • @fishtoprecord, thanks. I guess Apache Commons is much easier? I am having trouble importing it on my Macbook, the tar.gz file kept on going under Referenced Library. Any help? – user1061210 Feb 11 '12 at 20:49
  • @fishtoprecords I found out I can't use built in libraries. :( So hand coding? How do I get the erf function in inverse normal formula? – user1061210 Feb 11 '12 at 21:14
  • @AdelBoutros, I heard there is a when you can change normal distribution into radian, that would be easier to calculate? I haven't figured out, but that was the "hint". – user1061210 Feb 13 '12 at 04:55
0

Mathematically, this is a hard problem, and there are a few solutions you might consider.

Dislcaimer: Mathematical jargon ahead.

As you probably already know, the normalcdf function is used to calculate probabilities of normal random variables. Because a normal distribution is continuous, the corresponding probability density function (normalpdf) does not itself give probabilities, (in contrast to discrete distributions like binomial or geometric distributions). Instead, the area under the curve gives the probability that the normal random variable falls within a range of values. So, the normalcdf function you seek is the area under a section of the normalpdf function.

Mathematically, finding the area under a continuous curve is a fundamental problem of calculus. The solution to this type of problem is called an integral and integrating a function over a range of numbers means finding the area under the curve and between the lowest value in the range to the highest.

In most circumstances, we could just integrate the pdf function to get the cdf function, then evaluate it wherever we want. The heart of the problem, and the reason that an algorithm in Java is not as simple as one might think, is that normalpdf function does not have a closed form integral- it's value cannot be calculated in any finite number of steps. So, values of the normalcdf function are particularly elusive.

There are two main classes of solutions for the problem.

1. Numerical Integration Techniques

Numerical integration techniques solve the problem by approximating the area under the curve geometrically. The area is divided into rectangles or other shapes of equal or varying widths, with the height of each being given by the pdf function. The sum of the areas of the rectangle is an approximation of the area under the curve, which is the corresponding probability. These technique can be used to compute values to arbitrary precision, but is more computationally expensive than class 2. Using better approximations (e.g. Simpson's rule) improves computation. A simple numeric integration method follows.

public static double normCDF(double z)
{   double LeftEndpoint = -100;
int nRectangles = 100000;
    double runningSum = 0;
    double x;
    for(int n = 0; n < nRectangles; n++){
    x = LeftEndpoint + n*(z-LeftEndpoint)/nRectangles;
        runningSum += Math.pow(Math.sqrt(2*Math.PI),-1)*Math.exp(-Math.pow(x,2)/2)*(z-LeftEndpoint)/nRectangles;
    }
    System.out.println(runningSum);
    return runningSum;
}

2. Analytic Techniques

Analytic techniques take advantage of the fact that while the normalpdf does not have a closed-form integral, the pdf can be "converted" to a sum called a Taylor series, then integrated term-by-term. Basically, it turns the pdf into a sum of infinitely many simple functions, then integrates each one analytically, then adds together all of the integrals. Since this is an analytic procedure, a programmer need only include the integral series in the program after computing the coefficients. The precision of the result just depends on how many terms of the sum you include in the calculation, and tends to approach accurate values much sooner than numerical integration techniques. For example, the solution by Mohammad Aldefrawy computes just five coefficients. Below is a method that includes the computation of coefficients, so you one could compute values to arbitrary precision (Actually, the normalcdf series isn't computed directly. Instead, the coefficients of the related error function are computed then converted by a linear transformation). However, since computation of the coefficients involves the factorial function, one experiences memory issues for substantially large numbers of coefficients. Thankfully, this method returns values with much higher precision in a fraction of the iterations required by methods in the previous class of solutions to yield similar results.

public static double normalCDF(double x){
    System.out.println(0.5*(1+erf(x/Math.sqrt(2))));
    return 0.5*(1+erf(x/Math.sqrt(2)));
}

public static double erf(double z)
{
    int nTerms = 315;
    double runningSum = 0;
    for(int n = 0; n < nTerms; n++){
        runningSum += Math.pow(-1,n)*Math.pow(z,2*n+1)/(factorial(n)*(2*n+1));
    }
    return (2/Math.sqrt(Math.PI))*runningSum;
}

static double factorial(int n){
    if(n == 0) return 1;
    if(n == 1) return 1; 
    return n*factorial(n-1);
}

Other functions

For the inverse function, since we used the error function in the normalCDF method, we can use the inverse error function in a similar way. Again, we obtain the coefficients of the inverse error function analytically, then compute them as needed in the method.

public static double invErf(double z)
{
    int nTerms = 315;
    double runningSum = 0;
    double[] a = new double[nTerms + 1];
    double[] c = new double[nTerms + 1];
    c[0]=1;
    for(int n = 1; n < nTerms; n++){
        double runningSum2=0;
        for (int k = 0; k <= n-1; k++){
            runningSum2 += c[k]*c[n-1-k]/((k+1)*(2*k+1));
        }
        c[n] = runningSum2;
        runningSum2 = 0;
    }
    for(int n = 0; n < nTerms; n++){
        a[n] = c[n]/(2*n+1);
        runningSum += a[n]*Math.pow((0.5)*Math.sqrt(Math.PI)*z,2*n+1);
    }
    return runningSum;
}

public static double invNorm(double A){
    return (2/Math.sqrt(2))*invErf(2*A-1);
}

I don't have a method for the lognormal function, but you could obtain one using the same idea.

Andrew
  • 135
  • 1
  • 6
-1

I never tried it but the guys from algo team were using Colt and they were happy with the results.

aviad
  • 8,229
  • 9
  • 50
  • 98
  • to the best of my knowledge, Colt does not provide a function for the inverse cumulative distribution function. – steffen Aug 20 '12 at 08:30
  • thanks for the link. This is interesting. However 1) The OP knows the dist, so why bothering with an approximation 2) to my limited knowledge it seems that using interpolation for estimating the inverse cdf is not as straightforward as it seems (reasonable precision, reasonable computation time). I have asked a question solely about this [here](http://stackoverflow.com/questions/12034782/how-to-calculate-the-inverse-cumulative-distribution-function-in-java). Please consider to add an answer using colt and I will happily apologise, upvote and accept (no irony here !) :) – steffen Aug 21 '12 at 06:48