Although I am aware that two different Strings can return the same hashcode, I have been unable to find anything about two of differing lengths doing so. Is this possible, and if so, examples would be appreciated. This is using the java hashcode function, in case that changes anything.
2 Answers
Hashcodes are distributed over the space of an int
. The are only 2^32 = ~4 billion
possible values for an int
. There are well more than that number possible strings, so by the pigeonhole principle, there must exist multiple strings with the same hash codes.
However, this does not prove different length strings might have the same hash code, as pointed out below. Java uses the formula s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
for hashing strings. Knowing this, it is easy to construct strings of different length that have the same hash code:
Let String s1 = "\001!";
and String s2 = "@";
. Then s1.length() != s2.length()
but s1.hashCode() == '\001' * 31 + '!' == 1 * 31 + 33 == 64 == s2.hashCode() == '@' == 64
.
However, let me again say that there are over 4 billion possible values of an int
, so your probability of collision is low, although not as low as you might think, because of the Birthday Paradox, which gives that you have about a 50% chance of a collision after about 77K hashes (assuming hashes are randomly distributed, which really depends on your data - if you mostly deal with very small length strings you will have more frequent collisions). Every data structure that uses hashing deals must deal with collisions, though (e.g. a common way is to use linked lists at each hash position), or deal with loss of data (e.g. in a bloom filter).

- 3,346
- 1
- 17
- 21
-
3This doesn't answer whether length matters. For example, if hashcode was something like someFunction(theString) << 16 | length(theString) then strings of different length would always have different hashcode. – user949300 Oct 06 '16 at 05:11
Yes, this can happen.
Some rather trivial examples:
- initial zero-valued characters don't affect the hash-code, so (for example)
"foo"
,"\0foo"
,"\0\0foo"
, etc., all have the same hash-code. - each character just gets multiplied by 31 before adding the next character; so (for example) the two-character string
new String(new char[] { 12, 13 })
has the same hash-code as the single-characternew String(new char[] { 12 * 31 + 13 })
(where I selected12
and13
arbitrarily; the same works for any other values, as long as the12 * 31 + 13
analogue stays within the two-byte-unsigned-integer range).
But those are just some easy-to-construct examples. There are also plenty of pairs of strings that just happen to work out to have the same hash-code, despite no obvious relationship between them.

- 175,680
- 26
- 273
- 307