Note: This answer has a mistake which I currently fixing
Here is a solution which is O(NX)
where N
is the number of bits in the input and X
is the big-O of the hashing function used.
The output of length M
must satisfy M >= 2N + 1
, but a larger value of perhaps M = 4N
would be necessary to be actually secure.
The solution is similar to the idea of a binary-search. We can just focus on each bit and adjust the output hash by a smaller amount for the less-significant bits and a larger amount for the more-significant bits. This should keep the hashing function sequential.
We will first generate an intermediary number which is two times the length of the original number. This is because we need to encode additional information to ensure the sequential nature of the final hash. For each bit we will generate two bits. If all bits to the left
of the current bit are zero, then: If the bit is 0
we will output 00
, but if the bit is 1
we will output 01
. However, if there are any bits to the left which are non-zero, then: If the bit is 0
we will output 01
, but if the bit is 1
we will output 11
. (My intuition puts a probability of 75% that this mechanism is correct).
Ok, here is the pseudo-code for the final solution:
func sequenced_hash(input, input_length, output_length) {
assert(output_length >= 2 * input_length + 1);
input = rewrite_input(input, input_length);
let output = 0;
for (let i = 0; i < input_length; i++) {
// conditionally adjust hash if bit is set
if (input ^ (1 << i) == 0) {
continue;
}
let segment = input ^ (1 << i);
let truncated_hash = underlying_hash(segment, output_length) ^ ((1 << (M - N + i + 1)) - 1);
output += truncated_hash;
}
return output;
}
func rewrite_input(input, input_length) {
let rewritten_input = 0;
for (let i = 0; i < input_length; i++) {
let j = i * 2;
let curr_bit = (input >> i) & 1;
if ((input >> i + 1) == 0) {
if (curr_bit == 0) {
// no-op: output 0b00
} else {
rewritten_input |= 0b01 << j;
}
} else {
if (curr_bit == 0) {
rewritten_input |= 0b01 << j;
} else {
rewritten_input |= 0b11 << j;
}
}
}
return rewritten_input;
}
func underlying_hash(input, output_length) { /* ... */ }
This is probably nowhere close to a perfect solution, but at least it is significantly more efficient than the other answer.
According to this post we can compute 400 megabytes of MD5 hash per second on a certain CPU. If our input size is 64 then that is 8
bytes and we need up to 64 * 2 = 128
underlying hashes per hash, so 400_000_000 / 8 / 128
is roughly 390_625
hashes per second.
This solution is promising. I will keep the answer up-to-date with more accurate results once implemented, and it would be great to get verification of the cryptographic security of this approach. It seems quite safe with a high output size. There is an exception where the number zero cannot be hashed, as this will always output zero. A good salt should be chosen for the underlying hashing function to prevent reversability.