2

I am looking for a shuffle algorithm to shuffle a set of sequential numbers without buffering. Another way to state this is that I’m looking for a random sequence of unique numbers that have a given period.

Your typical Fisher–Yates shuffle needs to have each element all of the elements it is going to shuffle, so that isn’t going to work.

A Linear-Feedback Shift Register (LFSR) does what I want, but only works for periods that are powers-of-two less two. Here is an example of using a 4-bit LFSR to shuffle the numbers 1-14:

1 2 3 4 5 6 7 8 9 10 11 12 13 14
8 12 14 7 4 10 5 11 6 3 2 1 9 13

The first two is the input, and the second row the output. What’s nice is that the state is very small—just the current index. You can start of any index and get a difference set of numbers (starting at 1 yields: 8, 12, 14; starting at 9: 6, 3, 2), although the sequence is always the same (5 is always followed by 11). If I want a different sequence, I can pick a different generator polynomial.

The limitations to the LFSR are that the periods are always power-of-two less two (the min and max are always the same, thus unshuffled) and there not enough enough generator polynomials to allow every possible random sequence.

A block cipher algorithm would work. Every key produces a uniquely shuffled set of numbers. However all block ciphers (that I know about) have power-of-two block sizes, and usually a fixed or limited number of block sizes. A block cipher with a arbitrary non-binary block size would be perfect if such a thing exists.

There are a couple of projects I have that could benefit from such an algorithm. One is for small embedded micros that need to produce a shuffled sequence of numbers with a period larger than the memory they have available (think Arduino Uno needing to shuffle 1 to 100,000).

Does such an algorithm exist? If not, what things might I search for to help me develop such an algorithm? Or is this simply not possible?

Edit 2022-01-30

I have received a lot of good feedback and I need to better explain what I am searching for.

In addition to the Arduino example, where memory is an issue, there is also the shuffle of a large number of records (billions to trillions). The desire is to have a shuffle applied to these records without needing a buffer to hold the shuffle order array, or the time needed to build that array.

I do not need an algorithm that could produce every possible permutation, but a large number of permutations. Something like a typical block cipher in counter mode where each key produces a unique sequence of values.

A Linear Congruential Generator using coefficients to produce the desired sequence period will only produce a single sequence. This is the same problem for a Linear Feedback Shift Register.

Format-Preserving Encryption (FPE), such as AES FFX, shows promise and is where I am currently focusing my attention. Additional feedback welcome.

A. Que
  • 131
  • 1
  • 1
  • 6
  • Similar questions: [smart way to generate unique random number](https://stackoverflow.com/questions/3627029/smart-way-to-generate-unique-random-number/3627128); [Unique (non-repeating) random numbers in O(1)?](https://stackoverflow.com/questions/196017/unique-non-repeating-random-numbers-in-o1); [How do you efficiently generate a list of K non-repeating integers between 0 and an upper bound N](https://stackoverflow.com/questions/158716/how-do-you-efficiently-generate-a-list-of-k-non-repeating-integers-between-0-and-n). (Yes, I know the most upvoted answers ignore your "no buffer" requirement.) – Stef Jan 28 '22 at 20:03
  • So basically you need an algorithm that gives you a unique list of random numbers, without any gap? If memory is the limitation here, could it work if you generate one which works for 00-99, and then use it recursively for 4, 6, 8 digits, etc.? – Andrew Jan 28 '22 at 20:06
  • @Stef I need to study the first link more. A Linear Congruential Generator likely answers my question as written, but there is only one sequence per coefficient set, and that set is fairly limited. – A. Que Jan 28 '22 at 20:19
  • @Andrew If I understand what you are proposing, such a generator would work for powers of 10. But if I had, say, 1234567 as the desired period, I do not believe your approach would work. However, the recursive approach might be the answer. One might factor the desired period and generate subsets. I shall have to think on it. – A. Que Jan 28 '22 at 20:22
  • Is AES FFX an option? – David Eisenstat Jan 28 '22 at 21:13
  • 1
    "there is only one sequence per coefficient set": this is true of every pseudo-random generator ! –  Jan 28 '22 at 21:31

1 Answers1

1

It is certainly not possible to produce an algorithm which could potentially generate every possible sequence of length N with less than N (log2N - 1.45) bits of state, because there are N! possible sequence and each state can generate exactly one sequence. If your hypothetical Arduino application could produce every possible sequence of 100,000 numbers, it would require at least 1,516,705 bits of state, a bit more than 185Kib, which is probably more memory than you want to devote to the problem [Note 1].

That's also a lot more memory than you would need for the shuffle buffer; that's because the PRNG driving the shuffle algorithm also doesn't have enough state to come close to being able to generate every possible sequence. It can't generate more different sequences than the number of different possible states that it has.

So you have to make some compromise :-)

One simple algorithm is to start with some parametrisable generator which can produce non-repeating sequences for a large variety of block sizes. Then you just choose a block size which is as least as large as your target range but not "too much larger"; say, less than twice as large. Then you just select a subrange of the block size and start generating numbers. If the generated number is inside the subrange, you return its offset; if not, you throw it away and generate another number. If the generator's range is less than twice the desired range, then you will throw away less than half of the generated values and producing the next element in the sequence will be amortised O(1). In theory, it might take a long time to generate an individual value, but that's not very likely, and if you use a not-very-good PRNG like a linear congruential generator, you can make it very unlikely indeed by restricting the possible generator parameters.

For LCGs you have a couple of possibilities. You could use a power-of-two modulus, with an odd offset and a multiplier which is 5 mod 8 (and not too far from the square root of the block size), or you could use a prime modulus with almost arbitrary offset and multiplier. Using a prime modulus is computationally more expensive but the deficiencies of LCG are less apparent. Since you don't need to handle arbitrary primes, you can preselect a geometrically-spaced sample and compute the efficient division-by-multiplication algorithm for each one.

Since you're free to use any subrange of the generator's range, you have an additional potential parameter: the offset of the start of the subrange. (Or even offsets, since the subrange doesn't need to be contiguous.) You can also increase the apparent randomness by doing any bijective transformation (XOR/rotates are good, if you're using a power-of-two block size.)

Depending on your application, there are known algorithms to produce block ciphers for subword bit lengths [Note 2], which gives you another possible way to increase randomness and/or add some more bits to the generator state.


Notes

  1. The approximation for the minimum number of states comes directly from Stirling's approximation for N!, but I computed the number of bits by using the commonly available lgamma function.

  2. With about 30 seconds of googling, I found this paper on researchgate.net; I'm far from knowledgable enough in crypto to offer an opinion, but it looks credible; also, there are references to other algorithms in its footnotes.

rici
  • 234,347
  • 28
  • 237
  • 341
  • If we have access to true random numbers and don't need output that is deterministic in the seed, then the space requirement drops to n bits (this follows from a communication complexity bound). – David Eisenstat Jan 29 '22 at 02:54
  • @DavidEisenstat: correct, but the time requirement increases when you get close to the end of the sequence. – rici Jan 29 '22 at 02:57