I have a Boolean string (like "01100..001") of length 128 characters (means 128 number of 0/1). I am searching for an efficient (fast) hash function in Java, which produce a much lower representation than 128 bit and obviously with less collision. Can anybody help me, is there any such hash function ? Any suggestion ?
-
3Less collision than the zero you'd get with a 128-bit representation? – eggyal Apr 22 '12 at 17:15
-
@eggyal, Thanks a lot. Nice concept. It will help me lot. :) – Arpssss Apr 22 '12 at 17:27
-
Using a String for just storing a 128-bit value seems to me a bit of overkill, a waste of memory and - especially if you care on performance - definitely not the best option. – MRalwasser Apr 22 '12 at 17:41
3 Answers
Have you considered using a java.util.BitSet
instead, depending on what you are doing it could be a lot easier and more efficient? http://docs.oracle.com/javase/6/docs/api/java/util/BitSet.html
It has a .hashCode()
method as well.

- 11,811
- 2
- 41
- 49
-
1Thanks a lot. It will be nice one to try. However, is there any document stating that probability of collision ? – Arpssss Apr 22 '12 at 17:25
-
1Not that I'm aware of. I know it was improved back in 2004 (see bug parade: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4979028) and the java doc show (defines?) how the hashcode is calculated: The source is available as well of course. http://docs.oracle.com/javase/6/docs/api/java/util/BitSet.html#hashCode() – Mattias Isegran Bergander Apr 23 '12 at 15:24
Try using the .hashCode()
method on Java String
class, it returns an int
and it is very fast.
Or you can use the .hashCode()
method on java.util.BitSet
as Pulsar suggest, if you prefer store your data in BitSet
.

- 7,677
- 1
- 30
- 35
-
I was going to say that except that I would've converted the `String` to `BigInteger` first and called the `.hashCode()` method for that. But I'm guessing it's faster to just hash the original `String` like you suggested. Just wondering why you'd want to have 16 bytes stored as a 128 byte `String`, that seems like a huge waste of space. – ZeroOne Apr 22 '12 at 17:20
-
Thanks a lot. It will be nice one to try. However, is there any document stating the probability of collision ? – Arpssss Apr 22 '12 at 17:26
-
@ZeroOne, I am also thinking to convert it to BigInt and then call hashcode. Because, I think that will create less collisions. – Arpssss Apr 22 '12 at 17:30
-
Try to read this answer about collisions: http://stackoverflow.com/questions/299304/why-does-javas-hashcode-in-string-use-31-as-a-multiplier – dash1e Apr 22 '12 at 17:32
-
2Using the `hashCode()` of `BitSet` as per @Pulsar's answer is probably better. – Christoffer Hammarström Apr 22 '12 at 18:27
If you need to calculate the hash of a string, simply use the hashCode()
method of the String
class. Depending on the implementation, several optimizations are made for quickly computing this value.
As an example, in OpenJDK's implementation of the String
class the hashCode()
method caches the value in the hash
attribute and only needs to be computed once.
And who said that a string of 128 characters has a hash of 128-bits? all hashes returned by the hashCode()
method in Java are of type int
, and ints in Java are represented using 32-bits.

- 232,561
- 37
- 312
- 386