10

I'm looking for a special hash-function. Let's say I have a large list of strings, if I order them by their hash-values they should be ordered quasi randomly.

The most important point is: it must be super fast. I've tried md5 and sha1 and they're using to much cpu power.

Clashes are not a problem.

I'm using javascript, so it shouldn't be too complicated to implement.

Julian
  • 161
  • 1
  • 2
  • 8
  • see also http://programmers.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed – rogerdpack Jun 25 '13 at 23:04

4 Answers4

8

Take a look at Murmur hash. It has a nice space/collision trade-off:

http://sites.google.com/site/murmurhash/

Nils Pipenbrinck
  • 83,631
  • 31
  • 151
  • 221
5

It looks as if you want the sort of hash function used in a hash table, not the sort used to detect duplicates or tampering.

Googling will yield you a wealth of information on alternative hash functions. To start with, stay away from cryptographic signature hashes (like MD-5 or SHA-1), they solve another problem.

You can read this, or this, or this, to start with.

bmargulies
  • 97,814
  • 39
  • 186
  • 310
3

If speed is paramount, you can implement a simple ad-hoc hash, e.g. take the first and last letter and order your string by the last and then first letter. The result would look, as you say, "quasi random" and it would be fast. For instance, part of my answer sorted that way would look like this:

ca ad-hoc
el like
es simple
gt taking
hh hash
nc can
ti implement
uy you
Tomislav Nakic-Alfirevic
  • 10,017
  • 5
  • 38
  • 51
  • 2
    If the hash doesn't do a good job of avoiding collisions, then any speed you gain during hashing will be lost due to the collisions. The trick is to find a balance between the two. – Steven Sudit Feb 03 '11 at 07:48
  • 1
    Julian explicitly said in his question that clashes/collisions are not a problem and I can understand why. A simple hash like this will provide a non-obvious quasi-random word order: if multiple words have the same hash value, he might not care about sorting them further and simply take them as they come with no performance hit at all. Obviously, this specific hash function wouldn't work well with all kinds of data sets, but you don't seem to be speaking about corner cases. – Tomislav Nakic-Alfirevic Feb 03 '11 at 14:27
3

Hsieh, Murmur, Bob Jenkin's comes to my mind.
A nice page about hash functions that has some tests for quality and a simple S-box hash as well.

Andras Vass
  • 11,478
  • 1
  • 37
  • 49
  • Sounds like it's best to stick away from SuperFastHash. (1st link above) http://www.team5150.com/~andrew/blog/2007/03/breaking_superfasthash.html – hookenz May 04 '12 at 01:05
  • 1
    @Matt Well, based on that, you should avoid all hashes mentioned on this page in any of the answers, since they are not crypto hashes - in exchange, they are way faster than e.g. SHA, and - just as the OP asked - can be implemented in JS with little effort. ;-). Please do note the difference between crypto vs "standard" hashes: http://security.stackexchange.com/questions/11839/what-is-the-difference-between-a-hash-function-and-a-cryptographic-hash-function – Andras Vass May 04 '12 at 20:26