Why is np.random.default_rng().permutation(n) preferred over the original np.random.permutation(n)?

Question

Numpy documentation on np.random.permutation suggests all new code use np.random.default_rng() from the Random Generator package. I see in the documentation that the Random Generator package has standardized the generation of a wide variety of random distributions around the BitGenerator vs using Mersenne Twister, which I'm vaguely familiar with.

I see one downside, what used to be a single line of code to do simple permutations:

np.random.permutation(10)

turns into two lines of code now, which feels a little awkward for such a simple task:

rng = np.random.default_rng()
rng.permutation(10)

Why is this new approach an improvement over the previous approach?
And why wouldn't existing methods like np.random.permutation just wrap this new preferred method?
Is there a good reason not to use this new method as a one-liner np.random.default_rng().permutation(10), assuming it's not being called at high volumes?
Is there an argument for switching existing code to this method?

I cannot make a well-founded answer, but I guess the idea is (similarly what C++ does?) to separate the generators from the samplers, and force people to explicitely specify the generator. Also, see [the release comments](https://numpy.org/doc/stable/release/1.17.0-notes.html#new-extensible-numpy-random-module-with-selectable-random-number-generators). — phipsgabler, Jun 17 '20 at 19:32
I think the expectation is that you'd create a `default_rng` once, at the start of your script, and use that repeatedly with `perumutation`, `randint`, etc. For a one off random call I wouldn't put any extra effort into using the new package. I haven't used it when answering SO questions. When adding new features, it's usually safer to add them with new calls and interface, rather than replacing the old. There's less risk of messing up existing code. — hpaulj, Jun 17 '20 at 19:50

score 2 · Accepted Answer · answered Jun 17 '20 at 21:50

Some context:

To your questions, in a logical order:

And why wouldn't existing methods like np.random.permutation just wrap this new preferred method?

Probably because of backwards compatibility concerns. Even if the "top-level" API would not be changing, its internals would be significantly enough to be deemed a break in compatability.

Why is this new approach an improvement over the previous approach?

"By default, Generator uses bits provided by PCG64 which has better statistical properties than the legacy MT19937 used in RandomState." (source). The PCG64 docstring provides more technical detail.

Is there a good reason not to use this new method as a one-liner np.random.default_rng().permutation(10), assuming it's not being called at high volumes?

I very much agree that it's a slightly awkward added line of code if it's done at the module-start. I would only point out that the NumPy docs do directly use this form in docstring examples, such as:

n = np.random.default_rng().standard_exponential((3, 8000))

The slight difference would be that one is instantiating a class at module load/import time, whereas in your form it might come later. But that should be a minuscule difference (again, assuming it's only used once or a handful of times). If you look at the default_rng(seed) source, when called with None, it just returns Generator(PCG64(seed)) after a few quick checks on seed.

Is there an argument for switching existing code to this method?

Going to pass on this one since I don't have anywhere near the depth of technical knowledge to give a good comparison of the algorithms, and also because it depends on some other variables such as whether you're concerned about making your downstream code compatibility with older versions of NumPy, where default_rng() simply doesn't exist.

Why is np.random.default_rng().permutation(n) preferred over the original np.random.permutation(n)?

1 Answers1