0

I am searching for a function which has input 0, 1, 2, 3....N. Its result should be the same input array only permuted "randomly". The results must be unique and thus all of them are generated. Now, I am aware / don't mind that for all lists of 0, 1, .... N, the same results will be outputed. This is expected and fine, I just want the results to be the input shuffled around.

I found this function :

function perm( x )
{
    return x * 833 % N;
}

Where 833 can be any large-ish prime number. This makes good-ish results but they are always with a repeating pattern. See for N = 16: 0 3 6 9 12 15 2 5 8 11 14 1 4 7 10 13

Imagine it looks like 3 shark fins. Basically my question is, how do I make a function that does what I have described but with a more chaotic shuffle?

Discipol
  • 3,137
  • 4
  • 22
  • 41
  • First, define what you mean by "deterministic". Second, using just about any pre-defined pseudo-random-number generator will do better than the above scheme. – Hot Licks Apr 30 '13 at 17:24
  • 1
    I am probably using the term wrong, I meant to say not using Math.random(). Can you give me some examples I could try? I have a spreadsheet with a chart to test out formulas standing by ^_^ – Discipol Apr 30 '13 at 17:28
  • 2
    If you search around a bit there are dozens of questions here asking how to generate a randomly permutated list. – Hot Licks Apr 30 '13 at 17:31
  • 1
    If you read my post again, you can see I am not looking for that. – Discipol Apr 30 '13 at 18:00
  • 1
    Then what *are* you looking for?? – Hot Licks Apr 30 '13 at 18:14
  • I read your post, and it still sounds like that's exactly what you're looking for *(whether you want it to be pre-generated or not; you say you do above, but then that you don't below. But, either way, the answer is the same)* – BlueRaja - Danny Pflughoeft Apr 30 '13 at 18:16
  • See here: http://stackoverflow.com/questions/3131193/symmetric-bijective-algorithm-for-integers – leonbloy Apr 30 '13 at 18:17
  • 2
    @Discipol: You are being rude-arrogant. Those (two) condescending "if you read my post..." are not justified. And you seem to have a narrow concept of what a "function" is. A function is not necessarily a simple mathematical formula, it can use internally pregenerated permutation tables or anything. Open your mind and mind your manners . – leonbloy Apr 30 '13 at 18:45
  • 1
    @leonbloy It was implied that I did not search for this extensively on stack overflow/ rest of the internet. I may have spilled some of my frustration for this simple problem and my failure to fix it. I apologize for that. @ Hot Licks, I have x, and want F(x) where x =0,1...N and F(x)=shuffle(0,1,2...N) if z = y, then F(z) = F( y ). I mentioned the example with the prime number in the op but that shows a pattern. I CANNOT pregenerate anything as its a request/response with X and N. – Discipol Apr 30 '13 at 19:00
  • You mean that N can vary? Then it should be an argument of the function. – leonbloy Apr 30 '13 at 19:01
  • Its a parameter in the function but for all the calls in my particular scenario it remains the same: n = 40,000. – Discipol Apr 30 '13 at 19:04
  • If the function must work for any N, then it's quite complicated. For N=40000, I'd use the simplest 16-bit block cypher I find and try to adapt it. Or I'd reconsider if the linear pattern really matters. – leonbloy Apr 30 '13 at 19:11

2 Answers2

4

A linear congruential algorithm will always show that linear pattern. You'd rather want a block cypher, probably. See eg here and this related question

You small lengths, a more practical solution would be a pregenerated permutation table (as suggested another answer - that you rather rudely rejected and was deleted).

Community
  • 1
  • 1
leonbloy
  • 73,180
  • 20
  • 142
  • 190
  • The name of the problem is different, I have new parameters to investigate with, ty. I will post the solution once I find it. – Discipol Apr 30 '13 at 19:02
0

If the array fits in memory, then you'll likely want to use the Fisher-Yates shuffle. That is,

  1. Seed a pseudorandom number generator (PRNG)
  2. Iterate through each element of the list with some index which I call i.
  3. Swap the element which occurs at position i with another element at position ≥i chosen randomly (uniformly) by the PRNG.

For permutations of size less than a few million elements, this is a very good solution.

In case the array doesn't fit into memory, then you might want to use something fancier like a block cipher, as suggested by @leonbloy. This suffers from a long series of complications...

Block ciphers as permutations arise naturally in cryptography. If you have a secret permutation of all possible strings of 128 bits, then you can encrypt a message by chopping it up into 128 bit blocks, (padding those if necessary), and applying the permutation to each block. (I'm not talking about scrambling the bit order of the 128 bits, but instead of scrambling the set of all 128-bit bitstrings in any reversible way.) For cryptographers, it's essential that the permutation cannot be reverse-engineered in any way, since that breaks the encryption. Whether or not you personally care about having your permutations reverse-engineered, if you want to generate random permutations via a block cipher from the established crypto literature, you will run into complications arising from considerations of cryptographic security...

Unless the number n of elements that you wish to permute happens to coincide with (2 ^ blocksize) of the cipher, it's not at all obvious how to generate a permutation. Without a clever idea, you're out of luck if n is not a power of 2. To make matters worse, the most common and established block ciphers have a fixed block size. (For instance AES has a block size of 128 bits which is suitable for making permutations of 2^128 elements.)

One potential strategy for employing a block cipher when n is not a power of 2 is called a cycle walk. One embeds the input integer from the interval [0, n - 1] into the larger interval [0, 2 ^ block_size] and applies the block cipher. If the result lies is outside the desired interval [0, n - 1], the block cipher is repeatedly applied until the result falls within range. But a usable implementation still requires some work...

The average efficiency of the cycle walk is n / 2 ^ blocksize. For illustration, if n is 2 million and we use AES, then the efficiency is an infinitesimal 5.9e-33, and it will take forever to return any result. Thus it's practically important to use a cipher with a variable length block size. Then ideally one chooses the block size to be the minimum integer such that n ≤ (2 ^ block size), and the efficiency will be greater than 50%...

In cryptography, higher block sizes tend to be more secure, and most variable length block ciphers have a fairly large minimal block size. This is because small block size ciphers are vulnerable to more attacks. In fact, it seems to be an open question in cryptography whether or not there exists an efficient and theoretically sound block cipher with small block size. If our n is about 2^30 but the only available block cipher has a minimum block size of 64, then our efficiency will be horrible...

One suitable NIST-approved solution is FF3, which has been since revised to FF3-1 due to a small-block-size vulnerability. (While it is a minor improvement, FF3-1 has no theoretical guarantees, so it could have other similar vulnerabilities.)

Based on all of the above considerations, what follows is a quick Python implementation of cycle walk.

from typing import Callable, Optional


def cycle_walk(i: int, n: int, cipher: Callable[[str], str]) -> int:
    working_digits = len(int_to_bin(n - 1))
    i_as_str = int_to_bin(i, num_digits=working_digits)
    out_candidate_str = cipher(i_as_str)
    out_candidate_int = int(out_candidate_str, 2)
    if out_candidate_int >= n:
        return cycle_walk(out_candidate_int, n, cipher)
    else:
        return out_candidate_int


def int_to_bin(i: int, num_digits: Optional[int] = None) -> str:
    """Convert an integer to a binary string of 0s and 1s."""
    assert i >= 0
    s = bin(i)[2:]
    if num_digits is not None:
        assert len(s) <= num_digits
        s = s.zfill(num_digits)
    return s

These functions are designed to be compatible with this implementation of FF3-1. We create the cipher as follows:

from ff3 import FF3Cipher

key = "EF4359D8D580AA4F7F036D6F04FC6A94"
tweak = "D8E7920AFA330A73"

cipher = FF3Cipher(key, tweak, radix=2).encrypt

You can think of the key and tweak above as the PRNG seeds.

Finally, we compute a permuted index as follows:

cycle_walk(i=42, n=1_100_003, cipher=cipher)  # 295427

With n this small, it's actually still feasible to compute the full permutation table and verify that it is a permutation:

# This will take several minutes to generate the full permutation table.
permutation = [cycle_walk(i, 1_100_003, cipher) for i in range(1_100_003)]

# Verify that all the numbers are within the correct range.
assert [] == [i for i in permutation if not 0 <= i < len(permutation)]

# Verify that there are no duplicates.
assert len(set(permutation)) == len(permutation)

Disclaimers:

  • I'm not a cryptographer. If you need to generate cryptographically secure permutations, then discuss with an expert.
  • My answer is simply the result of an afternoon of curiosity. I don't know the literature so well, so there could be simpler known methods which I'm unaware of. If there are, then please let me know!
Ben Mares
  • 1,764
  • 1
  • 15
  • 26