Is it possible to have same hashcode for different strings using java's hashcode function?or if it is possible then what is the % of its possibility?
12 Answers
A Java hash code is 32bits. The number of possible strings it hashes is infinite.
So yes, there will be collisions. The percentage is meaningless - there is an infinite number of items (strings) and a finite number of possible hashes.

- 202,337
- 40
- 393
- 406
-
1So, can i say that it can produce 2^32 different hashes and after that it will repeat the hashcodes?? – Xara Apr 11 '12 at 08:31
-
If you manage to identify 2^32 strings that all have a different hashcode, then yes, any other string not in that list will have the same hashcode as one in that list. – Mat Apr 11 '12 at 08:32
-
8On a side note, this is called the pigeonhole principle http://en.wikipedia.org/wiki/Pigeonhole_principle – Kirill Rakhman Apr 11 '12 at 22:07
-
6You'd probably go through much fewer than 2^32 strings (around 2^16 strings) before hitting a collision. The reason why is related to the birthday paradox: http://betterexplained.com/articles/understanding-the-birthday-paradox/ – Xavi May 30 '13 at 23:16
-
9"The number of possible strings it hashes is infinite." Strings in Java have a maximum size because they use a `char` array and [arrays in Java (using the standard JVM) have a maximum size](http://stackoverflow.com/a/3039805/194894). Therefore the number of possible strings is not infinite. – Flow Oct 09 '14 at 17:41
-
2You CAN associate a chance to a collision. There are 2^32 possible hashcodes, so the chance of two different Strings having the same hashcode is 1/2^32, that is one in about 4 billion. According the birthday paradox principle, you need to make about 77,162 Strings to have a collision, see: https://en.wikipedia.org/wiki/Birthday_attack – GregT Nov 13 '17 at 16:20
YES. A lot.
Look at following pair
- "FB" and "Ea"
can return same hash code even though the characters in it are not same.
Basically it is the sum of characters in a string multiplied by an integer.

- 2,156
- 2
- 24
- 41
-
4That is incorrect. Each character is multiplied by a different number so anagrams would not necessarily return the same value. – assylias Apr 11 '12 at 08:27
-
-
if it is possible then what is the % of its possibility?
That is not a particularly meaningful question.
However, unless there is some systemic bias in the String::hashcode
function or the way that you are generating the String
objects, the probability that any two different (non-equal) String
objects will have the same hash code will be 1 in 232.
This assumes that the Strings are chosen randomly from the set of all possible String values. If you restrict the set in various ways, the probability will vary from the above number. (For instance, the existence of the "FB" / "Ea" collision means that the probability of a collision in the set of all 2 letter strings is higher than the norm.)
Another thing to note is that the chance of 232 different strings chosen at random (from a much larger unbiased set of strings) having no hash collisions is vanishingly small. To understand why, read the Wikipedia page on the Birthday Paradox.
In reality, the only way you are going to get no hash collisions in a set of 232 different strings is if you select or generate the strings. Even forming the set by selecting randomly generated strings is going to be computationally expensive. To produce such a set efficiently, you would need to exploit the properties of the String::hashCode
algorithm, which (fortunately) is specified.

- 698,415
- 94
- 811
- 1,216
-
So, can i say that for 2^32 different strings, the hashcode function will always produce different hashcode? – Xara Apr 11 '12 at 09:18
-
1@Zara Actually it even says quite the opposite! Having 2^32 different strings you will most likely have a collision (or even several..). – jorey Apr 12 '12 at 01:20
-
@jory - yes you are right. It's an example of the birthday paradox. (It is not entirely impossible that 2^32 different randomly generated strings will all have different hashcodes. Just vanishingly improbable.) – Stephen C Apr 12 '12 at 02:04
Yes, it is entirely possible. The probability of a string (or some other object type -- just assuming you'll be using strings in this example) having the same hashcode as some other string in a collection, depends on the size of that collection (assuming that all strings in that collection are unique). The probabilities are distributed as follows:
- With a set of size ~9,000, you'll have a 1% chance of two strings colliding with a hash in the set
- With a set of size ~30,000, you'll have a 10% chance of two strings colliding with a hash in the set
- With a set of size ~77,000, you'll have a 50% chance of two strings colliding with a hash in the set
The assumptions made, are:
- The hashCode function has no bias
- Each string in the aforementioned set is unique
This site explains it clearly: http://eclipsesource.com/blogs/2012/09/04/the-3-things-you-should-know-about-hashcode/ (Look at "the second thing you should know")

- 518
- 7
- 13
-
What's the set of characters for the strings that they've tested there? – android developer Sep 09 '15 at 12:32
This wouldn't directly answer your question, but I hope it helps.
The below is from the source code of java.lang.String
.
/**
* Returns a hash code for this string. The hash code for a
* <code>String</code> object is computed as
* <blockquote><pre>
* s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
* </pre></blockquote>
* using <code>int</code> arithmetic, where <code>s[i]</code> is the
* <i>i</i>th character of the string, <code>n</code> is the length of
* the string, and <code>^</code> indicates exponentiation.
* (The hash value of the empty string is zero.)
*
* @return a hash code value for this object.
*/
public int hashCode() {
int h = hash;
int len = count;
if (h == 0 && len > 0) {
int off = offset;
char val[] = value;
for (int i = 0; i < len; i++) {
h = 31*h + val[off++];
}
hash = h;
}
return h;
}

- 10,907
- 7
- 47
- 82

- 61,315
- 23
- 138
- 167
Yes, it is possible for two Strings to have the same hashcode - If you take a look at the Wikipedia article, you will see that both "FB"
and "Ea"
have the same hashcode. There is nothing in the method contract saying a hashCode()
should be used to compare for equality, you want to use equals()
for that.
Since Java 1.2, String implements hashCode()
by using a product sum algorithm over the entire text of the string.
Yes, by definition of the pigeon-hole concept, two different strings can produce the same hashcode and code should always be written to cater for such conditions (typically, by not breaking.)

- 4,180
- 2
- 21
- 48
The percentage of collisions for random strings should be minimal. However, if you hash strings from external sources, an attacker could easily create hundreds of thousands of strings having the same hashcode. In a java HashMap these would all map to the same bucket and effectively turn the map into a linked list. Access times to the map would then be proportional to the map size instead of constant, leading to a denial of service attack.
See this page on Effective DoS attacks against Web Application Plattforms for further information links to the presentation.

- 33,639
- 11
- 75
- 118
-
The risk of serious damage by those attacks can be mitigated by [Java 8's improvement over HashMap](https://www.nagarro.com/en/blog/post/24/performance-improvement-for-hashmap-in-java-8) when the number of collisions is high in hashcodes that map to the same bucket. – Jose Quijada Sep 08 '19 at 01:47
Yes this is possible, because one of the contract between equals() & hashCode() method of Object class is.......... If two object are not equal according to equals() method then there is no guaranty that their hashCode will be same, the hashCode may/may not be equal. i.e, if obj1.equals(obj2) return false then obj1.hashCode()==obj2.hashCode() may/may not return true. Example:
String str1 = "FB";
String str2 = "Ea";
System.out.println(str1.equals(str2));// false
System.out.println(str1.hashCode() == str2.hashCode()); // true

- 227
- 4
- 9
-
-
Because this is one of the contract between equals() and hashCode() method. If two object are not equal according to equals() method then there is no guaranty that their hashCode will be same. Please have look the java doc https://docs.oracle.com/javase/7/docs/api/java/lang/Object.html#equals(java.lang.Object) – Manish Kumar Feb 21 '19 at 13:08
//You can run the below code with -Xmx2100m and can get multiple results, enough to fill your console
`
import java.util.HashMap;
public class TestHashCollision {
public static void main(String[] args) {
final String TEXT = "was stored earlier had the same hash as";
HashMap<Integer,String> hs=new HashMap<>();
long t1=System.currentTimeMillis();
long t2=System.currentTimeMillis();
for(long l=0;l<Long.MAX_VALUE;l++) {
String key="d"+l;
if(hs.containsKey(key.hashCode())) {
System.out.println("'"+hs.get(key.hashCode())+"' "+TEXT+" '"+key+"'");//System.exit(0);
} else {
hs.put(key.hashCode(),key);
}
t2=System.currentTimeMillis();
if(t2-t1>10000) {
t1=System.currentTimeMillis();
System.out.println("10 seconds gone! size is:"+hs.size());
}
}
System.out.println("Done");
}
}
`

- 105
- 7
Yes(not just in Java, it applies to any language), it can produce the same hash-code for different strings. I am recalling a rule taught by my professor, it might be useful here -
Two same strings/value must have the same hashcode, but the converse is not true.
example in python
>>> hash('same-string')
-5833666992484370527
>>> hash('same-string')
-5833666992484370527
There might be another string which can match the same hash-code, so we can't derive the key using hash-code.
The reason for two different string to have the same hash-code is due to the collision.

- 3,576
- 8
- 50
- 75
"tensada".hashCode()
"friabili".hashCode());
The java hash function return equal values here.

- 2,620
- 6
- 26
- 41