3

I need 64 bit integer hashes of strings for something like a hash map.

It seems to me like there is no native PHP hash functionality that can return 64 bit integers?

I think it is possible to take the first part of a sha1 hash and convert it to an integer. However that will not bring the best performance and the conversion seems to be tricky.

Of course it would be nice to use native PHP functions without installations.

flori
  • 14,339
  • 4
  • 56
  • 63
  • First you need a hash with 64BIT size (or lower) to convert it into a 64BIT (signed) integer. Do you have a preference for the hashing algorythm, or is that part of your question? – hakre Jun 10 '11 at 23:44
  • Please see this related question which should be easy to transpose from Java into PHP: http://stackoverflow.com/questions/1660501/what-is-a-good-64bit-hash-function-in-java-for-textual-strings – hakre Jun 10 '11 at 23:46
  • You probably have considered it already, but what about using the strings themselves as keys into an array and letting PHP take care of the hashing implicitly? – Jon Jun 11 '11 at 00:01
  • @hakre: I prefer a native PHP algorithm, so maybe sha1? I tried that algorithm already, it was really slow compared to the native sha1() – flori Jun 11 '11 at 00:16
  • @Jon: I want to write it into a database, so an integer fits better. – flori Jun 11 '11 at 00:17

6 Answers6

7

I tried a lot, especially to convert a full 64 bit hex string to an signed 64 bit integer. Now I ended up with this:

function sha1_64bitInt($str) {
    $u = unpack('N2', sha1($str, true));
    return ($u[1] << 32) | $u[2];
}

The performance is somewhere in the middle. A lot better than implementing a full hash algorithm (like SimpleHash or dbj2) and a lot slower than a naked call to sha1() or crc32.

When there will be once a better solution to convert to a 64 bit integer it will be possible to improve this function without breaking backwards compatibility (I hope).

flori
  • 14,339
  • 4
  • 56
  • 63
  • Does this hash function distribute keys evenly on the 64 bit hash space? – tonix Oct 17 '20 at 22:46
  • 1
    @tonix This is using the first 8 bytes of the SHA1 hash. SHA1 is said to have an even distribution. So I would expect it, yes. – flori Oct 17 '20 at 23:11
3

See this page for info on an md5 hash :

http://us.php.net/manual/en/function.md5.php

This will output a 32 char hex string. Each hex char represents 4 bits of data. this means you need 16 hex chars (or half the md5 hash) to generate 64 bits.

you can then use hexdec to convert the 64 bits (16x4=64) from hex to an int. Note if you pas more than 64 bits the function will overflow to the float type to try to represent the larger number

http://php.net/hexdec

So basically

$stringToHash= "abcdefghijk...";
$hash = md5($stringToHash);
$substring = substr($hash, 0,16);
$finalInt = hexdec($substring);

That should work for you. (but i haven't tested it).

Zak
  • 24,947
  • 11
  • 38
  • 68
  • I tried something similar. The general problem is that hexdec() returns a float. It seems to me that hexdec interpret the hex value as an unsigned. – flori Jun 10 '11 at 23:58
  • you can always use just 63 or even 60 bits... It will just 0 pad the leading bits – Zak Jun 11 '11 at 00:04
0

See http://php.net/manual/ru/function.crc32.php#111699 for CRC64 implementation.

Vadym Myrgorod
  • 456
  • 1
  • 4
  • 11
0

If nobody else has a better idea, you can probably modify murmurhash_php to invoke the 64-bit version of MurmurHash3 pretty easily.

Nemo
  • 70,042
  • 10
  • 116
  • 153
  • As far as I understood these "64" means that the code is optimized for 64 bit hardware but still returning 32 or 128 bit hashes. – flori Jun 10 '11 at 23:37
  • A fair point. But the algorithm is so simple that you could implement it yourself (search for "64-bit finalizer" in [this wiki page](http://code.google.com/p/smhasher/wiki/MurmurHash3)) or just use 64 of the 128 bits. Guaranteed to be very fast and reasonably resistant to collisions. – Nemo Jun 10 '11 at 23:48
0

PHP doesn't support 64 bits integers I think. There is a BigInteger class (not native PHP) that you could use (and find if you Google really well). But it just uses strings internally and it's hella complex.

Halcyon
  • 57,230
  • 10
  • 89
  • 128
0

i wrote this function. it use crc32 and crc32b that are different to calculate two 32 bit hash, then combine them to get a 64 bit hash. finally convert it to 64 bit integer to save it in unsigned BIGINT field in MySQL database.

function hash64int($string)
{
    return $string ? hexdec(hash('crc32', $string) . hash('crc32b', $string)) : null;
}

in some cases a 32bit hash may be sufficient. so you can use this function and save it in unsigned INT field:

function hash32int($string)
{
    return $string ? hexdec(hash('crc32', $string)) : null;
}

i used crc32 because it is faster than others.

Ehsan Chavoshi
  • 681
  • 6
  • 10