Java MD5 encoder not match with C# MD5CryptoServiceProvider

Question

I am trying to generate C# MD5CryptoServiceProvider to encode string and Java MessageDigest.getInstance("MD5") to encode string, but both outputs are different. There are so many sample are already available on stackoverlflow but still stuck on some where.

Following is my C# code:

MD5CryptoServiceProvider md5Hasher = new MD5CryptoServiceProvider();
    Byte[] hashedDataBytes = null;
    UTF8Encoding encoder = new UTF8Encoding();

    hashedDataBytes = md5Hasher.ComputeHash(encoder.GetBytes("NSI#1234@"));
    string strPassword = string.Empty;
    foreach (byte b in hashedDataBytes)
    {
        strPassword = strPassword + b.ToString();
    }
    return strPassword;

The C# code is freeze code, i have not permission to change this code.

Following is my Java code:

MessageDigest messageDigest = MessageDigest.getInstance("MD5");
byte[] digest = messageDigest.digest("NSI#1234@".getBytes("UTF-8"));
String hash = new BigInteger(1, digest).toString();
System.out.println(hash);

C# code output: 158163028351382321031971922721528189209213

Java Code output: 210864369951346339831795420458152481237

C# generate 42 number and java generate 39 number. if i change the value of new BigInteger(1, digest).toString(8) it generates 43 number and if i change new BigInteger(1, digest).toString(9) it generates 41 number.

Your C# foreach to concatenate the byte values as string is not equivalent to your use of `BigInteger` in Java to construct a number and then toString it. — Mark Rotteveel, Jan 29 '15 at 13:55
Your C# code makes less sense than your java code (although usually you'd output the byte array directly as a hexadecimal string in both languages). — Mark Rotteveel, Jan 29 '15 at 14:00
If you have requirements such as "I can't change the C# code" please specify those *in the question to start with*. — Jon Skeet, Jan 29 '15 at 14:06

score 3 · Accepted Answer · edited May 23 '17 at 12:12

3

Neither your C# code nor your Java code are good ways to convert a hash to a string. I strongly suspect you've got the same bytes in both cases, but you're converting them to strings differently.

Your C# code is just converting each byte to its decimal representation. (It's also doing so with repeated string concatenation. Ick.) Your Java code will ignore leading 0s and is currently using decimal. You can call toString(16) to produce hex, but it will still ignore leading zeroes.

If you definitely want hex, you can use BitConverter.ToString(byte[]) to get a hex representation in .NET, although you may want to remove the - it will place between each byte; in Java there are various libraries available such as Apache Commons Codec (Hex) or Guava (BaseEncoding.base16()). Or use some code from one of the many answers to hex encoding in Java on Stack Overflow, such as this one.

Alternatively, you could use Base64 - again, there are multiple options available, such as BitConverter.ToBase64String in .NET and the iharder public domain library for Java.

If your C# code is truly frozen (run to the hills!) then the equivalent (well, slightly cleaner) Java code would be something like:

StringBuilder builder = new StringBuilder();
for (byte b : digest) {
    builder.append(b & 0xff); // Convert signed to unsigned
}

I'd also urge you to use StandardCharsets.UTF_8 if you're using Java 7+.

But I would strongly advise fixing the C# code if you possibly can. Even if you're in "code freeze" that presumably doesn't prohibit you from fixing important bugs - and if you're storing these values, it'll be a lot easier to fix this now than it will to do so later.

edited May 23 '17 at 12:12

Community

1
1

answered Jan 29 '15 at 13:59

Jon Skeet

1,421,763
867
9,128
9,194

If you don't want external libraries in Java, you can also do something like this: http://stackoverflow.com/a/13006907/466862 – Mark Rotteveel Jan 29 '15 at 14:02
Thanks @Jon now i am not able to change the c# code this is like freeze code, so now how i handle this? – Harmeet Singh Taara Jan 29 '15 at 14:04
@HarmeetSinghTaara: Then you need to change the Java code to do the same thing (I'll add an exmaple) - but you should *really, really* change the C# code as soon as you *possibly* can. The current format is ludicrous. – Jon Skeet Jan 29 '15 at 14:05
1

@HarmeetSinghTaara If you really need to do the same as that - really wrong - C# code, then start by doing the equivalent of that code; the differences between Java and C# are not that big (except byte is unsigned in C# and signed in Java) – Mark Rotteveel Jan 29 '15 at 14:07
@HarmeetSinghTaara The C# code is wrong. It does not compute a MD5 hash. Escalate until you find someone with a clue and get the permission to fix it. Or better: Fix it first, get the permission later. – stefan.schwetschke Jan 29 '15 at 14:32
@stefan.schwetschke: It *does* compute an MD5 hash... it just then encodes it as decimal values. While I agree it's horrible, it's *unlikely* (though not impossible) for it to lead to collisions. For example, a byte sequence containing {2, 22} is indistinguishable from { 22, 2 }. You can't recover the original MD5 hash from the result, but there are cases where that's okay. I'm not defending the code's quality, but I wouldn't say that it "doesn't compute a MD5 hash". – Jon Skeet Jan 29 '15 at 14:35
@JonSkeet It adds another level of hashing: The encoding cannot be reverted, so it is technically hashing. Hence the result is not a MD5 hash any more, but a hashed MD5 hash. The cryptographical properties of the MD5 hash (as weak as they are) are lost in that process. The result has even weaker properties as the MD5 hash and should not be used for anything. – stefan.schwetschke Jan 29 '15 at 14:47
@stefan.schwetschke: I agree it's generally poor, and really should be fixed - and yes, as you say, the encoding can't be reversed. I just think you've overstated it a *little*. – Jon Skeet Jan 29 '15 at 14:48

stefan.schwetschke · Answer 2 · 2015-01-29T14:52:09.907

The correct hash

I have checked the correct value on the console:

$ export LC_ALL=en_US.UTF-8
$ export LANG=en_US.UTF-8
$ export LANGUAGE=en_US.UTF-8
$ echo -n "NSI#1234@" | md5sum.exe
9ea3001c238ae867c5c01bd71cbdd1d5 *-

So the result from your Java code is right, it's just not presented correctly (see below).

Problems in the Java code

Your Java code is OK, it just formats the result wrong: It shows the result as a decimal number instead of the hexadecimal representation that is commonly used for hashes.

You should convert the number to hex before displaying it:

// [your code from above]

String hex=new BigInteger(1, digest).toString(16); // Hex, but without leading zeros
String fill=String.format("%0" + 32 + "d", 0) // This is ugly...
String hash=(fill+hex).substring(hex.length()) // ... and this is a hack to add leading zeros

Problems in the C# code

Your C# code concatenated the decimal representation of each byte. While this works with hex numbers (as long as you add leading zeros in each step!), it doesn't work with decimal numbers at all. So the C# code is wrong, you must fix it.

Better use the converter provided by .NET, it's faster and it works correctly:

// [your code from above]

string hex = BitConverter.ToString(hashedDataBytes).Replace("-", string.Empty);

Note that "While this works with hex" is only valid if you make sure you give two characters for each byte. A naive "convert each byte into its hex representation" approach can easily end up with the same sort of problem. — Jon Skeet, Jan 29 '15 at 14:48

Java MD5 encoder not match with C# MD5CryptoServiceProvider

2 Answers2

The correct hash

Problems in the Java code

Problems in the C# code