0

I have the output of UTF-8 hash_file that I need to calculate and check on my java client. Based on the hash_file manual I'm extracting the contents of the file and create the MD5 hash hex on Java, but I can't make them match. I tried suggestions on [this question] without success2.

Here's how I do it on Java:

public static String calculateStringHash(String text, String encoding) 
        throws NoSuchAlgorithmException, UnsupportedEncodingException{
    MessageDigest md = MessageDigest.getInstance("MD5");
    return getHex(md.digest(text.getBytes(encoding)));
}

My results match the ones from this page.

For example:

String jake: 1200cf8ad328a60559cf5e7c5f46ee6d

From my Java code: 1200CF8AD328A60559CF5E7C5F46EE6D

But when trying on files it doesn't work. Here's the code for the file function:

public static String calculateHash(File file) throws NoSuchAlgorithmException,
            FileNotFoundException, IOException {
        BufferedReader br = null;
        StringBuilder sb = new StringBuilder();
        try {
            String sCurrentLine;
            br = new BufferedReader(new FileReader(file));
            while ((sCurrentLine = br.readLine()) != null) {
                sb.append(sCurrentLine);
            }
        } catch (IOException ex) {
            LOG.log(Level.SEVERE, null, ex);
        } finally {
            try {
                if (br != null) {
                    br.close();
                }
            } catch (IOException ex) {
                LOG.log(Level.SEVERE, null, ex);
            }
        }
        return calculateStringHash(sb.toString(),"UTF-8");
    }

I verified that on the PHP side hash_file is used and UTF-8 is the encryption. Any ideas?

javydreamercsw
  • 5,363
  • 13
  • 61
  • 106
  • UTF-8 is an [encoding](http://en.wikipedia.org/wiki/Character_encoding) and not an encryption. – Gumbo Dec 23 '12 at 16:27

2 Answers2

3

Your reading method removes all the end of lines from the file. readLine() returns a line, without its line terminator. Print the contents of the StringBuilder, and you'll understand the problem.

Moreover, a hashing algorithm is a binary operation. It operates on bytes, and returns bytes. Why are you transforming the bytes in the file into a String, to later transform the String back to an array of bytes in order to hash it. Just read the file as a byte array, using an InputStream, instead of reading it as a String. Then hash this byte array. This will also avoid using the wrong file encoding (your code uses the platform default encoding, which might not be the encding used to create the file).

JB Nizet
  • 678,734
  • 91
  • 1,224
  • 1,255
  • This part of the answer. For the code I used see this question: http://stackoverflow.com/questions/5297552/calculate-md5-hash-of-a-zip-file-in-java-program – javydreamercsw Dec 23 '12 at 16:45
1

I guess you are missing out on the new line characters from the file since you call br.readLine().

It is better to read the file into byte array, and pass that onto md.digest(...).

user1885297
  • 586
  • 2
  • 6