4

I'm trying to hash the same string in C# and in Java.

C# hash method:

  public static string hashValue (string value)
    {
        byte[] input = null;

        HashAlgorithm digest = HashAlgorithm.Create("SHA-512");
        input = digest.ComputeHash(Encoding.UTF8.GetBytes(value));

        return System.Text.UTF8Encoding.UTF8.GetString(input);
    }

The output, in a WPF TextBox, for this is looking like: "՘"�?N[��"��2��D��j��t!z}7�H�p�J����GƼOp�EnBfHڄ�X���" .

The same function, in Java, is returning the result: "[B@41e2db20".

The Java hash method like this:

    public static String hashValue(String value) {

    byte[] input = null;

    MessageDigest digest;
    try {
        digest = MessageDigest.getInstance("SHA-512");
        try {
            input = digest.digest(value.getBytes("UTF-8")); 

        } catch (UnsupportedEncodingException e) {
            e.printStackTrace();
        }
    } catch (NoSuchAlgorithmException e1) {
        e1.printStackTrace();
    }

    return input.toString();
}

Can you please let me know what I'm doing wrong? Why is the result looking that weird in C#?

sepo
  • 394
  • 1
  • 8
  • 1
    To me the Java version looks wrong. `SHA-512` should return 512-bits => 64 byte => `[B@41e2db20` is to short (see http://en.wikipedia.org/wiki/SHA-2#Examples_of_SHA-2_variants) – Christoph Fink Jun 18 '14 at 14:15
  • You shouldn't display hash as a UTF8. It is just a byte array and not a string. See http://stackoverflow.com/a/5340599/706456 for an example. – oleksii Jun 18 '14 at 14:18
  • Don't treat hashes as strings - If you want to display it, hex encode it – Mark Peters Jun 18 '14 at 14:19
  • I see now what the problem is. Thank you very much for opening my eyes :) – sepo Jun 18 '14 at 14:25

1 Answers1

4

Your C# result is looking "weird" because you've converted the random bytes of a hash into a UTF-8 string. That isn't going to result in anything pretty-looking, since many of the byte values will map to unprintable characters.

You may wish to convert the hash to hexadecimal instead. For that, use the DatatypeConverter class:

return DatatypeConverter.printHexBinary(input);

I'm not sure the C# equivalent - but check Google.


For the record, the Java equivalent of your current C# code would be:

return new String(input, "UTF-8");

Currently you are calling .toString(), which for a Java byte array results in a call to the Object.toString() method. This prints the type and hashcode of the object, but not the contents.

Duncan Jones
  • 67,400
  • 29
  • 193
  • 254
  • 1
    It's not only a problem that many of the characters are not printable, but an arbitrary byte array will probably contain invalid UTF-8 sequences. In Java, such sequences are converted to the Unicode character U+FFFD (REPLACEMENT CHARACTER). First of all, this will truncate the hash and defeat it's purpose and you can not be sure that invalid UTF-8 sequences are treated identically in other languages. – jarnbjo Jun 18 '14 at 14:49
  • @jarnbjo Thanks for the additional info. This explains the question marks appearing within the output: http://www.fileformat.info/info/unicode/char/0fffd/index.htm – Duncan Jones Jun 18 '14 at 14:51