If the array fits in memory, then you'll likely want to use the Fisher-Yates shuffle. That is,
- Seed a pseudorandom number generator (PRNG)
- Iterate through each element of the list with some index which I call
i
.
- Swap the element which occurs at position
i
with another element at position ≥i
chosen randomly (uniformly) by the PRNG.
For permutations of size less than a few million elements, this is a very good solution.
In case the array doesn't fit into memory, then you might want to use something fancier like a block cipher, as suggested by @leonbloy. This suffers from a long series of complications...
Block ciphers as permutations arise naturally in cryptography. If you have a secret permutation of all possible strings of 128 bits, then you can encrypt a message by chopping it up into 128 bit blocks, (padding those if necessary), and applying the permutation to each block. (I'm not talking about scrambling the bit order of the 128 bits, but instead of scrambling the set of all 128-bit bitstrings in any reversible way.) For cryptographers, it's essential that the permutation cannot be reverse-engineered in any way, since that breaks the encryption. Whether or not you personally care about having your permutations reverse-engineered, if you want to generate random permutations via a block cipher from the established crypto literature, you will run into complications arising from considerations of cryptographic security...
Unless the number n
of elements that you wish to permute happens to coincide with (2 ^ blocksize) of the cipher, it's not at all obvious how to generate a permutation. Without a clever idea, you're out of luck if n
is not a power of 2. To make matters worse, the most common and established block ciphers have a fixed block size. (For instance AES has a block size of 128 bits which is suitable for making permutations of 2^128 elements.)
One potential strategy for employing a block cipher when n
is not a power of 2 is called a cycle walk. One embeds the input integer from the interval [0, n - 1]
into the larger interval [0, 2 ^ block_size]
and applies the block cipher. If the result lies is outside the desired interval [0, n - 1]
, the block cipher is repeatedly applied until the result falls within range. But a usable implementation still requires some work...
The average efficiency of the cycle walk is n / 2 ^ blocksize. For illustration, if n
is 2 million and we use AES, then the efficiency is an infinitesimal 5.9e-33
, and it will take forever to return any result. Thus it's practically important to use a cipher with a variable length block size. Then ideally one chooses the block size to be the minimum integer such that n
≤ (2 ^ block size), and the efficiency will be greater than 50%...
In cryptography, higher block sizes tend to be more secure, and most variable length block ciphers have a fairly large minimal block size. This is because small block size ciphers are vulnerable to more attacks. In fact, it seems to be an open question in cryptography whether or not there exists an efficient and theoretically sound block cipher with small block size. If our n
is about 2^30 but the only available block cipher has a minimum block size of 64, then our efficiency will be horrible...
One suitable NIST-approved solution is FF3, which has been since revised to FF3-1 due to a small-block-size vulnerability. (While it is a minor improvement, FF3-1 has no theoretical guarantees, so it could have other similar vulnerabilities.)
Based on all of the above considerations, what follows is a quick Python implementation of cycle walk.
from typing import Callable, Optional
def cycle_walk(i: int, n: int, cipher: Callable[[str], str]) -> int:
working_digits = len(int_to_bin(n - 1))
i_as_str = int_to_bin(i, num_digits=working_digits)
out_candidate_str = cipher(i_as_str)
out_candidate_int = int(out_candidate_str, 2)
if out_candidate_int >= n:
return cycle_walk(out_candidate_int, n, cipher)
else:
return out_candidate_int
def int_to_bin(i: int, num_digits: Optional[int] = None) -> str:
"""Convert an integer to a binary string of 0s and 1s."""
assert i >= 0
s = bin(i)[2:]
if num_digits is not None:
assert len(s) <= num_digits
s = s.zfill(num_digits)
return s
These functions are designed to be compatible with this implementation of FF3-1.
We create the cipher as follows:
from ff3 import FF3Cipher
key = "EF4359D8D580AA4F7F036D6F04FC6A94"
tweak = "D8E7920AFA330A73"
cipher = FF3Cipher(key, tweak, radix=2).encrypt
You can think of the key and tweak above as the PRNG seeds.
Finally, we compute a permuted index as follows:
cycle_walk(i=42, n=1_100_003, cipher=cipher) # 295427
With n
this small, it's actually still feasible to compute the full permutation table and verify that it is a permutation:
# This will take several minutes to generate the full permutation table.
permutation = [cycle_walk(i, 1_100_003, cipher) for i in range(1_100_003)]
# Verify that all the numbers are within the correct range.
assert [] == [i for i in permutation if not 0 <= i < len(permutation)]
# Verify that there are no duplicates.
assert len(set(permutation)) == len(permutation)
Disclaimers:
- I'm not a cryptographer. If you need to generate cryptographically secure permutations, then discuss with an expert.
- My answer is simply the result of an afternoon of curiosity. I don't know the literature so well, so there could be simpler known methods which I'm unaware of. If there are, then please let me know!