MD5 checksum not calculated properly for files other than txt?

Question

I am using crypto-js to calculated the MD5 checksum for my file before uploading, below is my code.

import CryptoJS from "crypto-js";

const getMd5 = async (fileObject) => {
  let md5 = "";
  try {
    const fileObjectUrl = URL.createObjectURL(fileObject);
    const blobText = await fetch(fileObjectUrl)
      .then((res) => res.blob())
      .then((res) => new Response(res).text());

    const hash = CryptoJS.MD5(CryptoJS.enc.Latin1.parse(blobText));
    md5 = hash.toString(CryptoJS.enc.Hex);
  } catch (err) {
    console.log("Error occured getMd5:", err);
  }
  return md5;
};

Above code is working fine for text files only but while working with non text files file images, videos etc., the checksum is calculated incorrectly.

Any help/input is appreciated. Thanks!

[`Response.text()`](https://developer.mozilla.org/en-US/docs/Web/API/Response/text) goes along with a UTF8 encoding. This corrupts (arbitrary) binary data. Probably [`Response.arrayBuffer()`](https://developer.mozilla.org/en-US/docs/Web/API/Response/arrayBuffer) is the better choice and the data will not be corrupted. However, the `arrayBuffer` still needs to be converted for CryptoJS, since CryptoJS only works with `WordArray`s (probably something like `CryptoJS.lib.WordArray.create()`, s. [here](https://stackoverflow.com/a/25611179/9014097)). — Topaco, Jul 08 '21 at 12:50

Maarten Bodewes · Answer 1 · 2021-07-08T15:22:33.727

Just feed the result of .then((res) => res.blob()) into the MD5 function directly.

Encoding to text() is probably lossy (and/or uses replacement characters), and Latin1 doesn't cover the full range of possible byte values either - officially at least. There is just no need to convert to text and then back to binary either.

It is required to convert to a binary representation that can be accepted by CryptoJS - as implemented in the other answer. This however needs to be a binary to binary conversion, not a binary -> text -> binary conversion.

score 2 · Accepted Answer · answered Jul 08 '21 at 14:53

Response.text() reads the response stream and converts it to a string using a UTF-8 encoding. Arbitrary binary data that is not UTF-8 compliant will be corrupted in this process (e.g. images, videos, etc.), s. also the other answer.
This is prevented by using Response.arrayBuffer() instead, which simply stores the data unchanged in an ArrayBuffer.
Since CryptoJS works internally with WordArrays, thus a further conversion of the ArrayBuffer into a WordArray is necessary.

The following fix works on my machine:

(async () => {
            
    const getMd5 = async(fileObject) => {
        let md5 = "";
        try {
            const fileObjectUrl = URL.createObjectURL(blob);
            const blobText = await fetch(fileObjectUrl)
                .then((res) => res.blob())
                .then((res) => new Response(res).arrayBuffer());                    // Convert to ArrayBuffer       
            const hash = CryptoJS.MD5(CryptoJS.lib.WordArray.create(blobText)); // Import as WordArray
            md5 = hash.toString(CryptoJS.enc.Hex);
        } catch (err) {
            console.log("Error occured getMd5:", err);
        }
        return md5;
    };
        
    const blob = new Blob([new Uint8Array([0x01, 0x02, 0x03, 0x7f, 0x80, 0x81, 0xfd, 0xfe, 0xff])]);
    console.log(await(getMd5(blob)));
        
})();

<script src="https://cdnjs.cloudflare.com/ajax/libs/crypto-js/4.0.0/crypto-js.min.js"></script>

For simplicity, I did not use a file object for the test, but a blob object with data that is not UTF8 compliant. The generated hash is correct and can be verified online e.g. here

MD5 checksum not calculated properly for files other than txt?

2 Answers2