1

This question is somewhat similar to another question, however that question is asking for a reversible function as opposed to a non-reversible function.

I would like a hash function that takes in a 64-bit unsigned integer, and outputs a larger size integer (e.g. 128-bit or 256-bit) such that for all numbers n, its hash is greater than the hash of the number n - 1. This ensures the sequencing/ordering of the hashes remains intact. It must be possible to include some sort of a salt to prevent reversing the hash in any way.

Is there any standard hashing function to do this? If not, are there any custom cryptographically sound solutions? Are there any methods that are very fast as this may need to be done hundreds of thousands of times per second in a single process?

David Callanan
  • 5,601
  • 7
  • 63
  • 105
  • 1
    What are you using it for, do the values have to be sortable? If so, does it depend on the value itself or would a second field like a creation-date solve the problem? – martinstoeckli Sep 15 '22 at 11:36
  • @martinstoeckli Yeah, they need to be sortable in the sense that sorting the hashes will have the same order as sorting the original. I am using this to hide sensitive/private numbers while still maintaining ordering. A creation-date would not solve the problem. – David Callanan Sep 15 '22 at 17:58
  • Several of the answers, including your own, come down to "rolling your own crypto", which is notoriously difficult and usually a Bad Idea™. Proving that the ordering is correct is easy. Proving that the hash is computationally difficult to reverse is much harder. It's a super interesting question from an academic point of view, but it smells like an X-Y problem. Why can't you just maintain the order separately from the hash? – Thomas Sep 21 '22 at 09:57

3 Answers3

1

Edit: To achive your target you need to accumulate hashs of each byte of the number:

var md5 = MD5.Create();

byte[] GetHash(ulong input) =>
    BitConverter.GetBytes(input)
                .SelectMany(x=> GetByteHash(x))
                .ToArray();

byte[] GetByteHash(byte val)
{
    uint sum = 0;
    for (byte i = 0; i <= val; i++)
    {
        sum += BitConverter.ToUInt32(md5.ComputeHash(new[] { val }));
    }
    return BitConverter.GetBytes(sum);
}

Less performant version, but cryptographically secure:

using System;
using System.Linq;
using System.Numerics;
using System.Security.Cryptography;

Console.WriteLine(Convert.ToHexString(GetHash(14)));

static byte[] GetHash(long input)
{
    var md5 = MD5.Create();
    var sum = new BigInteger(0);
    for (long i = 0; i < input; i++)
    {
        var h = new BigInteger(md5.ComputeHash(BitConverter.GetBytes(i)));
        if(h<0) h *= -1;
        sum += h;
    }

    var data = sum.ToByteArray();
    return Enumerable.Repeat<Byte>(0, 32 - data.Length).Concat(data).ToArray();
}
0

Note: This answer has a mistake which I currently fixing

Here is a solution which is O(NX) where N is the number of bits in the input and X is the big-O of the hashing function used.

The output of length M must satisfy M >= 2N + 1, but a larger value of perhaps M = 4N would be necessary to be actually secure.

The solution is similar to the idea of a binary-search. We can just focus on each bit and adjust the output hash by a smaller amount for the less-significant bits and a larger amount for the more-significant bits. This should keep the hashing function sequential.

We will first generate an intermediary number which is two times the length of the original number. This is because we need to encode additional information to ensure the sequential nature of the final hash. For each bit we will generate two bits. If all bits to the left of the current bit are zero, then: If the bit is 0 we will output 00, but if the bit is 1 we will output 01. However, if there are any bits to the left which are non-zero, then: If the bit is 0 we will output 01, but if the bit is 1 we will output 11. (My intuition puts a probability of 75% that this mechanism is correct).

Ok, here is the pseudo-code for the final solution:

func sequenced_hash(input, input_length, output_length) {
  assert(output_length >= 2 * input_length + 1);

  input = rewrite_input(input, input_length);

  let output = 0;

  for (let i = 0; i < input_length; i++) {
    // conditionally adjust hash if bit is set
    if (input ^ (1 << i) == 0) {
      continue;
    }

    let segment = input ^ (1 << i);
    let truncated_hash = underlying_hash(segment, output_length) ^ ((1 << (M - N + i + 1)) - 1);

    output += truncated_hash;
  }

  return output;
}

func rewrite_input(input, input_length) {
  let rewritten_input = 0;
  
  for (let i = 0; i < input_length; i++) {
    let j = i * 2;
    let curr_bit = (input >> i) & 1;

    if ((input >> i + 1) == 0) {
      if (curr_bit == 0) {
        // no-op: output 0b00
      } else {
        rewritten_input |= 0b01 << j;
      }
    } else {
      if (curr_bit == 0) {
        rewritten_input |= 0b01 << j;
      } else {
        rewritten_input |= 0b11 << j;
      }
    }
  }

  return rewritten_input;
}

func underlying_hash(input, output_length) { /* ... */ }

This is probably nowhere close to a perfect solution, but at least it is significantly more efficient than the other answer.

According to this post we can compute 400 megabytes of MD5 hash per second on a certain CPU. If our input size is 64 then that is 8 bytes and we need up to 64 * 2 = 128 underlying hashes per hash, so 400_000_000 / 8 / 128 is roughly 390_625 hashes per second.

This solution is promising. I will keep the answer up-to-date with more accurate results once implemented, and it would be great to get verification of the cryptographic security of this approach. It seems quite safe with a high output size. There is an exception where the number zero cannot be hashed, as this will always output zero. A good salt should be chosen for the underlying hashing function to prevent reversability.

David Callanan
  • 5,601
  • 7
  • 63
  • 105
0

This question is quite complicated but I believe what you are asking should be possible. I think the reason you would want to do this is to improve response times for authenticating a password in a database. (e.g. O(1) isn't true because only part of the list can be stored in memory)

The first thing to mention is that hash functions should map to numbers that don't produce many collisions and won't produce the same values if some of the keys have been swapped in different orders. A really basic example of this when words get values from there ascii totals e.g. using dub and bud map to the same value before it's been hashed. Once you have a good hashing algorithm with lookup that works in O(1) time you can think about if that function is reversible and if it needs a salt. You can prepend or append the salt (which can be uniquely generated and stored with each hashed value), and it will prevent brute force or dictionary based brute force attacks. One of the ways functions are made into 'one-way' functions is by using modulo arithmetic.

You then would probably need to think about mathematically how to build a modulo function that always increases.

In particular first you should look at when you say its hash is greater than the hash of the number n - 1. This ensures the sequencing/ordering. You could easily have a function where a hash is greater than an earlier key because of how the function works. I am just thinking a bit randomly here but maybe you could look a bit at number theory, if you look at the basic mapping of integer sets e.g. n=>2n and then think about mapping a set with the modulo function applied where the modulo increase is an (exponential) function of n then maybe it will produce increasing values which are sequentially hashed.

CJW
  • 332
  • 1
  • 14
  • Thanks for your answer. Looking into number theory could be something interesting to do in the future. I was hoping there would already be an easy solution, but I guess not. – David Callanan Sep 21 '22 at 19:43