-1

So, if I do:

fileReader.onload = function (e) {
    console.log(md5(e.target.result));
};
fileReader.readAsArrayBuffer(blob);

I get: df9206f11a5c4fc7841fca94522f19f2

But, if I do:

fileReader.onload = function (e) {
    console.log(md5(e.target.result));
};
fileReader.readAsText(blob);

I get a completely different hash. I assume this is due to character encoding? So I am curious, what encoding can I use which will result in an identical hash?

patrick
  • 9,290
  • 13
  • 61
  • 112

1 Answers1

1

Using readAsArrayBuffer() will read the source as a "pure" byte-range independent of what the data represents and its byte-order.

Using readAsText() without any encoding options will take two and two bytes from the source, assume and convert to a single UTF-16 (or UCS-2) character which will produce a completely different result, as you noticed.

If you know the source is in for example UTF-8 text format you can read it using the optional encoding options with readAsText(blob[, encoding]) (see supported encoding types).

Any common single-byte encoding page should suffer, in that case, as MD5 signatures as text are always within the ASCII range - the main issue then, is that it needs to be read as single byte and not double as with UTF-16/USC-2.

A different problem could be byte-order. For this case an alternative is to read it as ArrayBuffer and then use TextDecoder (see example answer) with correct byte-order (there is a BOM option available (ignoreBOM) for this approach), e.g. little-endian or big-endian (denoted as "le" and "be", f.ex. "utf-16be", in the previous linked encoder types).

  • great, thanks! And.. any idea why this question was down voted so much? I had no idea this would be such an inappropriate question on here. – patrick Nov 16 '17 at 00:24