I'm encoding binary data b1, b2, ... bn
using an alphabet. But since the binary representations of the b
s are more or less sequential, a simple mapping of bits to chars results in very similar strings. Example:
encode(b1) => "QXpB4IBsu0"
encode(b2) => "QXpB36Bsu0"
...
I'm looking for ways to make the output more "random", meaning more difficult to guess the input b
when looking at the output string.
Some requirements:
- For differenent
b
s, the output strings must be different. Encoding the sameb
multiple times does not necessarily have to result in the same output. As long as there are no collisions between the output strings of different inputb
s, everything is fine. - If it is of any importance: each
b
is around ~50-60 bits. The alphabet contains 64 characters. - The encoding function should not produce larger output strings than the ones you get by just using a simple mapping from the bits of
b
s to chars of the alphabet (given the values above, this means ~10 characters for eachb
). So just using a hash function like SHA is not an option.
Possible solutions for this problem don't need to be "cryptographically secure". If someone invests enough time and effort to reconstruct the binary data, then so be it. But the goal is to make it as difficult as possible. It maybe helps that a decode function is not needed anyway.
What I am doing at the moment:
- take the next 4 bits from the binary data, let's say
xxxx
- prepend 2 random bits
r
to getrrxxxx
- lookup the corresponding char in the alphabet:
val char = alphabet[rrxxxx]
and add it to the result (this works because the alphabet's size is 64) - continue with step 1
This appraoch adds some noise to the output string, however, the size of the string is increased by 50% due to the random bits. I could add more noise by adding more random bits (rrrxxx
or even rrrrxx
), but the output would get larger and larger. One of the requirements I mentioned above is not to increase the size of the output string. Currently I'm only using this approach because I have no better idea.
As an alternative procedure, I thought about shuffling the bits of an input b
before applying the alphabet. But since it must be guaranteed that different b
s result in different strings, the shuffle function should use some kind of determinism (maybe a secret number as an argument) instead of being completely random. I wasn't able to come up wih such a function.
I'm wondering if there is a better way, any hint is appreciated.