2

I would like to find the fastest way to generate ~10^9 poisson random numbers in python/numpy—for instance, say I have a mean Poisson parameter (calculated elsewhere) of shape (1000, 2000), and I need 500 independent samples. This is a bottleneck in my code, taking several minutes to complete. I have tried three methods, but am looking for something faster:

import numpy as np

# example parameters
nsamples = 500
nmeas = 2000
ninputs = 1000
lambdax = np.ones([ninputs, nmeas]) * 20

# numpy, one big array
sample0 = np.random.poisson(lam=lambdax, size=(nsamples, ninputs, nmeas))

# numpy, current version where other code happens in the loop
sample1 = np.zeros([nsamples, ninputs, nmeas])
for i in range(nsamples):
    sample1[i, :, :] = np.random.poisson(lam=lambdax)

# scipy
from scipy.stats import poisson
sample2 = poisson.rvs(lambdax, size=(nsamples, ninputs, nmeas))

Results:

sample0: 1 m 16 s
sample1: 1 m 20 s
sample2: 1 m 50 s

Not shown here, I am also parallelizing the independent samples via multiprocessing, but the calculations are still pretty expensive for such large parameters. Is there a better way?

Jayson Vavrek
  • 55
  • 1
  • 4
  • What is the range of means for the Poisson samples? If they are very small, see this question: https://stackoverflow.com/questions/61614458/performance-for-drawing-numbers-from-poisson-distribution-with-low-mean . If you will form a sum of the Poisson variables, just take a single Poisson variable with the sum of their means. – Peter O. May 12 '20 at 22:17
  • @PeterO. the means can vary quite a bit, from close to 0 to values in the thousands. Unfortunately, for subsequent steps in the analysis I need the full noised sample of `lambdax` for each of the `nsamples` – Jayson Vavrek May 12 '20 at 22:22
  • 1
    Will `lambdax` (in your example) contain the same value in all its entries? If not, your example will likely not show typical performance. – Peter O. May 12 '20 at 22:28
  • 1
    @PeterO. fair point, no, `lambdax` will have different values in general. Each of the `ninputs` rows will be fairly similar to each other, but across the `nmeas` columns the values will vary quite a bit – Jayson Vavrek May 12 '20 at 22:31
  • 1
    If I do something arbitrary like `lambdax[i, j] = 10*np.sin(j/10) + 20 + i`, the times are 55 s, 59 s, and 1 m 31 s, respectively. I'm guessing there's some speedup under the hood when the Poisson mean is >> 20 or so. – Jayson Vavrek May 12 '20 at 22:51

1 Answers1

0

I have been in your shoes and here are my suggestions:

  • For large mean values, poisson works similar to uniform. check out this post (and probably more if you search) .
  • ~1m runtime seems reasonable to generate such a large number of random numbers. I don't think you can top sample0 method by much via just coding. Now depending on what you want to do with random numbers,
    • if your issue is rerunning program multiple times, try saving sample0 into a file and reloading it in the next runs.
    • if not, I suggest creating lower number of randoms and reuse them. A lot of those random numbers in sample0 will be repeated in your sample, depending on your mean value. You might want to create smaller sample size and randomly choose from them. for example I would chose a random number from sample0 and reuse it for e.g. 100 times (since that number would appear in sample0 over 100 times anyways).

If you provide more information on what you intend to do with random numbers, we might be able to help more. Otherwise, coding-wise I am not sure if you can do much further.

Ehsan
  • 12,072
  • 2
  • 20
  • 33
  • I'm simulating gamma-ray detectors, so I'm Poisson noising a model of `nmeas` measurements to generate realistic statistical noise. I'm doing this `nsamples` times for `ninputs` realizations of noise in the input parameters. The issue comes when on top of that, I simulate dozens of different input parameter sets. Come to think of it, I can probably just do one Poisson noise per input noise, i.e., `nsamples = 1 if ninputs > 1`... – Jayson Vavrek May 13 '20 at 02:04
  • That is exactly what I mean. If I understand it correctly, your poisson mean is a constant 20? in that case, if you create a 10^9 samples from it, there will be a lot of repetitions (maybe an array of 1k-10k would even suffice). You can simply generate a small number of samples and use `np.rand.choice` to randomly chose from it to add to your array as noise. I would even suggest to bundle your gamma-ray input array indices into groups and add same noise to each group, if your array is significantly larger size than your poisson mean value. – Ehsan May 13 '20 at 03:25
  • It's not a constant 20, no. As discussed above it can range from ~0 to values in the thousands, so pre-generating a small number of samples won't work. I'm also working in mean counts, not discrete counts, so I can't generate integer counts either. I can see how your `np.rand.choice` solution would work in the (constant) integer case, though. My realization above is that, in the broader picture, I probably just need to run fewer sample, especially by only performing one Poisson noise for each input noise. – Jayson Vavrek May 13 '20 at 03:41
  • Yes. correct me if I am wrong, isn't the sample0 code you provided with constant mean? Nonetheless, if per input, mean is constant, you can do one smaller sample per mean vs. around 1M that I can see in code now. Even with mean of 1000, a sample of 1M would be too much(of course it depends on your accuracy goals too). I am not sure if I understand mean counts vs. discrete counts. Could you please elaborate on that? – Ehsan May 13 '20 at 03:49
  • 1
    The `sample0` was just something I quickly put together. As Peter O pointed out, that was not really representative. Hence the other `lambdax[i, j]` formula deep in the comments above, with a variable mean. And I was conflating two ideas talking about mean vs discrete counts, sorry—ignore that! I will think a bit more about this small subset idea. It certainly is intriguing. – Jayson Vavrek May 13 '20 at 04:12
  • I see. Thank you for clarifying. Hopefully it gets resolved. – Ehsan May 13 '20 at 04:32