Encode a byte array with an alphabet, output should look randomly distributed

Question

I'm encoding binary data b1, b2, ... bn using an alphabet. But since the binary representations of the bs are more or less sequential, a simple mapping of bits to chars results in very similar strings. Example:

encode(b1) => "QXpB4IBsu0"
encode(b2) => "QXpB36Bsu0"
...

I'm looking for ways to make the output more "random", meaning more difficult to guess the input b when looking at the output string.

Some requirements:

For differenent bs, the output strings must be different. Encoding the same b multiple times does not necessarily have to result in the same output. As long as there are no collisions between the output strings of different input bs, everything is fine.
If it is of any importance: each b is around ~50-60 bits. The alphabet contains 64 characters.
The encoding function should not produce larger output strings than the ones you get by just using a simple mapping from the bits of bs to chars of the alphabet (given the values above, this means ~10 characters for each b). So just using a hash function like SHA is not an option.

Possible solutions for this problem don't need to be "cryptographically secure". If someone invests enough time and effort to reconstruct the binary data, then so be it. But the goal is to make it as difficult as possible. It maybe helps that a decode function is not needed anyway.

What I am doing at the moment:

take the next 4 bits from the binary data, let's say xxxx
prepend 2 random bits r to get rrxxxx
lookup the corresponding char in the alphabet: val char = alphabet[rrxxxx] and add it to the result (this works because the alphabet's size is 64)
continue with step 1

This appraoch adds some noise to the output string, however, the size of the string is increased by 50% due to the random bits. I could add more noise by adding more random bits (rrrxxx or even rrrrxx), but the output would get larger and larger. One of the requirements I mentioned above is not to increase the size of the output string. Currently I'm only using this approach because I have no better idea.

As an alternative procedure, I thought about shuffling the bits of an input b before applying the alphabet. But since it must be guaranteed that different bs result in different strings, the shuffle function should use some kind of determinism (maybe a secret number as an argument) instead of being completely random. I wasn't able to come up wih such a function.

I'm wondering if there is a better way, any hint is appreciated.

score 0 · Answer 1 · edited May 23 '17 at 10:34

Basically, you need a reversible pseudo-random mapping from each possible 50-bit value to another 50-bit value. You can achieve this with a reversible Linear Congruential Generator (the kind used for some pseudo-random number generators).

When encoding, apply the LCG to your number in the forward direction, then encode with base64. If you need to decode, decode from base64, then apply the LCG in the opposite direction to get your original number back.

This answer contains some code for a reversible LCG. You'll need one with a period of 2⁵⁰. The constants used to define your LCG would be your secret numbers.

score 0 · Answer 2 · answered Jun 10 '16 at 15:46

You want to use a multiplicative inverse. That will take the sequential key and transform it into a non-sequential number. There is a one-to-one relationship between the keys and their results. So no two numbers will create the same non-sequential key, and the process is reversible.

I have a small example, written in C#, that illustrates the process.

private void DoIt()
{
    const long m = 101;
    const long x = 387420489; // must be coprime to m

    var multInv = MultiplicativeInverse(x, m);

    var nums = new HashSet<long>();
    for (long i = 0; i < 100; ++i)
    {
        var encoded = i*x%m;
        var decoded = encoded*multInv%m;
        Console.WriteLine("{0} => {1} => {2}", i, encoded, decoded);
        if (!nums.Add(encoded))
        {
            Console.WriteLine("Duplicate");
        }
    }
}

private long MultiplicativeInverse(long x, long modulus)
{
    return ExtendedEuclideanDivision(x, modulus).Item1%modulus;
}

private static Tuple<long, long> ExtendedEuclideanDivision(long a, long b)
{
    if (a < 0)
    {
        var result = ExtendedEuclideanDivision(-a, b);
        return Tuple.Create(-result.Item1, result.Item2);
    }
    if (b < 0)
    {
        var result = ExtendedEuclideanDivision(a, -b);
        return Tuple.Create(result.Item1, -result.Item2);
    }
    if (b == 0)
    {
        return Tuple.Create(1L, 0L);
    }
    var q = a/b;
    var r = a%b;
    var rslt = ExtendedEuclideanDivision(b, r);
    var s = rslt.Item1;
    var t = rslt.Item2;
    return Tuple.Create(t, s - q*t);
}

Code cribbed from the above-mentioned article, and supporting materials.

The idea, then, is to take your sequential number, compute the inverse, and then base-64 encode it. To reverse the process, base-64 decode the value you're given, run it through the reverse calculation, and you have the original number.

Encode a byte array with an alphabet, output should look randomly distributed

2 Answers2