0

I need to somehow get the text from its hash in java. I have this code:

String myString = new String("creashaks organzine");
int hashCode = myString.hashCode();
System.out.println("Hash:" + hashCode); 

The result of this code will be 0. But the hash of "pollinating sandboxes" string will also be 0.

There might be collisions, for example with "creashaks organzine" and "pollinating sandboxes" and I want to find collisions like in this case.

Catgirl21
  • 37
  • 10
  • 2
    It is impossible to "get the text from its hash". You can build a `Map>` of hash codes to strings with that hash code, then start generating permutations of strings to build the map. It will be slow to build and result will be huge, but you can then find string candidates for a particular hash code with a simple map lookup. – Andreas Feb 16 '20 at 10:29
  • Do you have any code? I am a newbie to java – Catgirl21 Feb 16 '20 at 11:09
  • I can't think of a real-world problem that requires finding collisions of `String.hashCode()`. If you rely on them being different, then you're approaching whatever problem you have from the wrong angle. (Or please prove me wrong by giving some more background) – Olaf Kock Feb 16 '20 at 11:54
  • @OlafKock It's not a real-world problem. It is related to the game Minecraft which uses String.hashCode() to convert any string to an int hashcode in order to generate the game world. I know that there are many strings that can hash into the same hash code. I want to somehow write a program that can find such strings. – Catgirl21 Feb 16 '20 at 13:34
  • In that case: You'll need to either try out _many_ combinations, or analyze the algorithm mathematically. – Olaf Kock Feb 16 '20 at 16:59
  • @OlafKock Do you have any ideas on how to do that? – Catgirl21 Feb 17 '20 at 14:40

2 Answers2

2

Since i don't have enough reputation to add comment, i will quote solution from another question

You know that several objects can have same hash(), as it mentioned in java doc for Object.hashCode()

It is not required that if two objects are unequal * according to the {@link java.lang.Object#equals(java.lang.Object)} * method, then calling the {@code hashCode} method on each of the * two objects must produce distinct integer results.

It's obvious you can't restore different objects from same hash code, so it's impossible at all, simple logic.

how can I get the String from hashCode

abdullahicyc
  • 89
  • 2
  • 6
  • What if I want to find collisions of the hash function, but not an original string? – Catgirl21 Feb 16 '20 at 11:11
  • I was tempted to flag this question as duplicate of the linked question, but didn't, because the underlying problem of the other question is a stored password, and their validation, for which one definitely wouldn't use `String.hashCode()` – Olaf Kock Feb 16 '20 at 11:53
1

This is a very interesting thing. Regarding the specification in https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/String.html#hashCode() says that the hashCode is calculated from the string content but the example seems to shows that is not true for the first string:

class Main
{
    public static void main(String[] args)
    {
        String myString1 = "creashaks organzine";
        String myString2 = "crsomething else";
        String myString3 = "crsomething else";

        System.out.println("Hash1:" + myString1.hashCode());
        System.out.println("Hash2:" + myString2.hashCode());
        System.out.println("Hash3:" + myString3.hashCode());
    }
}

Outputs:

Hash1:0
Hash2:444616526
Hash3:444616526

But when I modify the string, then I get a different output:

String myString1 = "creashaks organzine...";
System.out.println("Hash1:" + myString1.hashCode());

Outputs:

Hash1:45678

So it seems that somebody tricked us by giving a very rare example string that produced exactly the "0" as output. Here you see that the hashCode is not very unique, so you cannot use is safely to compare strings.

Coming back to your initial question: The hashCode is a number with reduced details, so you cannot calculate it back to the original string. This applies to all hash codes.

Hash codes are so often used in server side databases instead of real password strings. They can be compared but not reconstructed.

Stefan
  • 1,789
  • 1
  • 11
  • 16
  • But how did somebody come up with this string that gives 0? This means that getting a string from the hash is not impossible. There are many strings that give you 0. My goal is to find out how to generate such strings given their hash. – Catgirl21 Feb 16 '20 at 12:02
  • Good question, I have no Idea. Google did some better thing: They manipulated the background color of an invoice in PDF format without breaking the hash code, so the manipulation was not detected by the machine, although very obvious to a human. That is possibly a job for a good mathematician. – Stefan Feb 16 '20 at 12:12