2

I have to generate data for a Poisson distribution. My range is n = 1000 up to 100K. Where n is the number of data elements; k varies from 1 to n. It says to use lambda as n/2

I have never taken stats and have no idea how to get the correct curve here. I can feed it lambda as n/2, but do I vary K from 0-n? I tried this (passing k in as a parameter) and when I graphed the data it ramped up, not a fish tail. What am I doing wrong, or am I doing it correctly?

Thanks

I have this code in java from Knuth.

static double poissonRandomNumber(int lambda) {
    double L = Math.exp(-lambda);
    int k = 0;
    double p = 1;
    do {
        k = k + 1;
        double u = Math.random();
        p = p * u;
    } while (p > L);
    return k - 1;
}
Dixon Steel
  • 1,021
  • 4
  • 12
  • 27
  • 1
    You need to know what the function for the Poisson distribution is to start: http://en.wikipedia.org/wiki/Poisson_distribution. If you can use a library, try Apache Commons Math: https://commons.apache.org/proper/commons-math/apidocs/org/apache/commons/math3/distribution/PoissonDistribution.html – duffymo Sep 25 '14 at 19:50
  • The function was given as f(K,lambda)=Pr(k)= lambda^k e^lambda/k!; e=natural log 2.71... I don't know what to do with this, I have no idea what I am doing here. I tried to insert the forumla in a pic and it didn't take. – Dixon Steel Sep 25 '14 at 20:03
  • 1
    That agrees with the link I posted. – duffymo Sep 25 '14 at 20:05
  • The commons math or wiki? Even if it agrees what am I doing wrong here that the data doesn't follow the curve? – Dixon Steel Sep 25 '14 at 20:12
  • Why is the return type `double` instead of `int`? http://stackoverflow.com/questions/1241555/algorithm-to-generate-poisson-and-binomial-random-numbers – Ravi Yenugu Sep 25 '14 at 20:16
  • Because it looked like all the Poisson data I looked at was doubles, I'm really not sure how this works. – Dixon Steel Sep 25 '14 at 20:20
  • So how do I generate my numbers correctly? – Dixon Steel Sep 25 '14 at 20:48
  • What do you mean "ramped up"? What does your graph look like? – Jay Elston Sep 26 '14 at 02:15
  • Also note that this formula is not suitable for large values of lambda. exp(-1000) ~= 10 ^ - 435. You might get some ideas from the answers to this question: http://stackoverflow.com/questions/5657861/generating-poisson-variables-in-c – Jay Elston Sep 26 '14 at 02:23
  • My graph start at like 1 and keeps going up as n gets larger and its very jagged. I need to generate this data to test some code. – Dixon Steel Sep 26 '14 at 11:49
  • The implementation returns doubles. Donald Knute wrote it. Perhaps you’d like to instruct him. – duffymo Jan 29 '22 at 21:15
  • Knuth’s algorithm. Please instruct all of us and provide a better one. I’d like to see it. – duffymo Jan 29 '22 at 21:20
  • How odd that you keep addressing me. I didn’t ask the question or write the code. I’ll ask again: write better, correct code and post it as an answer. We’ll all learn and comment. – duffymo Jan 29 '22 at 21:23
  • Great. Post better code and educate me. – duffymo Jan 29 '22 at 21:26
  • Done. Thank you. I apologize for my ignorance. – duffymo Jan 29 '22 at 21:29

1 Answers1

1

One of the problems you are running into is a basic limitation of how computers represent and perform calculations with floating point numbers.

A real number is represented on a computer in a form similar to scientific notation:

Significant digits × base^exponent

For double precision numbers, there are 11 bits used for the exponent and 52 for the "significant digits" portion. Because floating point numbers are normalized, the first positive floating point number > 0.0 has a value of about 10^-320 (this is defined as Double.MIN_VALUE in Java). See IEEE Standard 754 Floating Point Numbers for a good writeup on this.

Consider the line of code:

double L = Math.exp(-lambda);

With a lambda of 1000, e^-1000 (which is about 10^-435) is less than Double.MIN_VALUE, and there is no way the computer can represent e^-1000 any differently than it can represent e^-100000

You can solve this problem by noticing that lambda is an "arrival rate", and you can calculate random samples for shorter intervals and sum them. That is

x = p(L);

can be computed as

x = p(L/2) + p(L/2);

and larger numbers can be approximated:

x = 100 * p(L/100);

The Wikipedia article has on the Poisson distribution has some good pointers to ways to compute Poisson distributions for large values of lambda.

Jay Elston
  • 1,978
  • 1
  • 19
  • 38
  • 1
    Adopting your notation `x = p(L)`, you're correct that `x = p(L/2) + p(L/2)`. By induction `x = sum(from: 1, to: 100, p(L/100) )`, but that's not the same as `100*p(L/100)` because each evaluation of `p(L/100)` yields a different outcome. Basic properties of variance tell us that the variance of a sum of uncorrelated random variables is equal to the sum of the individual variances, while the variance of a constant times a random variable is the constant squared times the RV's variance. Consequently, `100 * p(L/100)` has 100 times the variance that `x = sum(from: 1, to: 100, p(L/100) has. – pjs Jan 30 '22 at 01:10
  • 1
    Please see https://en.wikipedia.org/wiki/Variance#Basic_properties to confirm the results cited in the prior comment. – pjs Jan 30 '22 at 01:11