THIS QUESTION IS ABOUT C++ CODE TARGETED FOR AVX/AVX2 INSTRUCTIONS, as shipped in Intel processors since 2013 (and/or AVX-512 since 2015).
How do I generate one million random Gaussian unit normals fast on Intel processors with new instructions sets?
More generic versions of this question were asked a few times before, e.g., as in Generate random numbers following a normal distribution in C/C++. Yes, I know about Box-Muller and adding and other techniques. I am tempted to build my inverse normal distribution, sample (i.e., map) exactly according to expectations (pseudo-normals, then), and then randomly rearrange sort order.
But, I also know I am using an Intel Core processor with recent AVX vector and AES instruction sets. besides, I need C (not C++ with its std
library), and it needs to work on Linux and OSX with gcc.
So, is there a better processor-specific way to generate so many random numbers fast? For such large quantities of random numbers, does Intel processor hardware even offer useful instructions? Are they an option worth looking into: and if so, is there an existing standard function implementation of "rnorm"?