4

Is there a hash function where small changes in the input result in small changes in the output? For example, something like:

hash("Foo") => 9e107d9d372bb6826bd81d3542a419d6
hash("Foo!") => 9e107d9d372bb6826bd81d3542a419d7 <- note small difference
Joachim Sauer
  • 302,674
  • 57
  • 556
  • 614
Paul Wicks
  • 62,960
  • 55
  • 119
  • 146
  • 1
    That would be a *really* bad hash algorithm.... – skaffman Nov 06 '09 at 11:41
  • 1
    For a cryptographic hash, yes, this would be bad, but I want to use it for something else. – Paul Wicks Nov 06 '09 at 11:50
  • I think you need to provide more details as to what is the intended purpose of such a function. There is definitely no cryptographic hash function with that property, but maybe you are looking for something different? – Kris Nov 06 '09 at 11:51
  • 4
    What is your definition of a "small change" in the output? Edit distance (treating hashes as strings) or mathematical distance of numbers (treating hashes as integers?) – Rafał Dowgird Nov 06 '09 at 12:14

4 Answers4

4

I wouldn't call this a hash because the point of a hash is exactly the opposite. However, with your stated goal of small changes in input producing small changes in output, I would look at using either a soundex function or the Ratcliff algorithm.

Chris Judge
  • 1,952
  • 1
  • 13
  • 10
3

I would recommend the simhash, algorithm by Mark Manasse.

Jeff Kubina
  • 800
  • 4
  • 15
0

A trivial solution would be be to XOR all bytes module N. E.g. for a 64 bits hash, you'd XOR (input[0] ^ input[8] ^ input[16]) + 256*(input[1] ^ input[9] ^ input[17]) etc. So, "Foo" hashes to "Foo\0\0\0\0\0" and "Foo!" hashes to "Foo!\0\0\0\0".

MSalters
  • 173,980
  • 10
  • 155
  • 350
0

Locality-sensitive hashing (LSH) reduces the dimensionality of high-dimensional data. LSH hashes input items so that similar items map to the same “buckets” with high probability:

https://en.wikipedia.org/wiki/Locality-sensitive_hashing

Also see: https://en.wikipedia.org/wiki/Perceptual_hashing

Here is a nice example of perceptual hashing on DNA sequences:

http://arxiv.org/pdf/1412.5517.pdf

gkcn
  • 1,360
  • 1
  • 12
  • 23