6

how do I convert a UTF-8 string to Latin1 encoded string using javascript?

Here is what I am trying to do:

  1. I get a file, split that in chunks by reading as arraybuffer
  2. then, I parse the arraybuffer as string
  3. and passing it to cryptoJS for hash computation using following code:

    cryptosha256 = CryptoJS.algo.SHA256.create();
    cryptosha256.update(text);
    hash = cryptosha256.finalize();
    

It all works well for a text file. I get problems when using the code for hashing a non-text files (image/.wmv files). I saw in another blog and there the CryptoJS author requires the bytes to be sent using Latin1 format instead of UTF-8 and that's where I am stuck.

Not sure, how can I generate the bytes (or strings) using Latin1 format from arraybuffer in javascript?

$('#btnHash').click(function () {
    var fr = new FileReader(), 
        file = document.getElementById("fileName").files[0];
    fr.onload = function (e) {
        calcHash(e.target.result, file);
    };
    fr.readAsArrayBuffer(file);
});
function calcHash(dataArray, file) {
    cryptosha256 = CryptoJS.algo.SHA256.create();
    text = CryptoJS.enc.Latin1.parse(dataArray);
    cryptosha256.update(text);
    hash = cryptosha256.finalize();
}
Artjom B.
  • 61,146
  • 24
  • 125
  • 222
learnedOne
  • 143
  • 2
  • 11
  • 2
    'bytes' are not in Latin1 or any other format. And for binary files like (most) images and sounds, character encoding doesn't really apply. If you convert text from one encoding to another, you just have text in another encoding (with possibly the loss of some characters). If you convert a binary file to another text encoding, you will most likely have a corrupt file. – GolezTrol Nov 25 '15 at 11:01
  • I'm pretty sure that CryptoJS does directly take an arraybuffer. No need to care about text encodings. – Bergi Nov 25 '15 at 11:02
  • thanks GolezTrol... here is what crypto author writes: "When you pass a string to a hasher, it's converted to bytes using UTF-8. That's to ensure foreign characters are not clipped. Since you're working with binary data, you'll want to convert the string to bytes using Latin1." sha256.update(CryptoJS.enc.Latin1.parse(evt.target.result)); – learnedOne Nov 25 '15 at 11:03
  • the link for above statement: https://code.google.com/p/crypto-js/issues/detail?can=2&start=0&num=100&q=&colspec=ID%20Type%20Status%20Priority%20Milestone%20Owner%20Summary&groupby=&sort=&id=62 – learnedOne Nov 25 '15 at 11:04
  • when I tried using the crypto method sha256.update(CryptoJS.enc.Latin1.parse(evt.target.result)); It returned 'undefined' as hash value :( – learnedOne Nov 25 '15 at 11:05
  • Are you sure `evt.target.result` contains the correct value? Please update your question with the whole code snippet. – nwellnhof Nov 25 '15 at 11:21
  • @Bergi No, CryptoJS doesn't work on an ArrayBuffer. It has an internal binary format that stores the data in an array of words (32 bit ints). It would be necessary to convert an ArrayBuffer to WordArray – Artjom B. Nov 25 '15 at 12:08
  • @nwellnhof .... updated the detailed code in original question. – learnedOne Nov 25 '15 at 12:14
  • @ArtjomB. for me, it's not working for small images either(I tried with 200KB png file as well). – learnedOne Nov 25 '15 at 12:25
  • @ArtjomB. I tried using `readAsBinaryString` too. That still gets me undefined as `hash' value... – learnedOne Nov 25 '15 at 12:37
  • Please don't post a solution to your question. I rolled back your edit. You can add an additional answer to your question. – Artjom B. Nov 25 '15 at 14:52
  • Just came to post that after spending hours debugging this online the solution in the comment above, using `CryptoJS.enc.Latin1.parse(evt.target.result)` to get the proper SHA1 hash finally worked for me. It seems when reading binary data, the Latin1 parsing is needed. – Matt Welke Sep 08 '19 at 02:21

1 Answers1

19

CryptoJS doesn't understand what an ArrayBuffer is and if you use some text encoding like Latin1 or UTF-8, you will inevitably lose some bytes. Not every possible byte value has a valid encoding in one of those text encodings.

You will have to convert the ArrayBuffer to CryptoJS' internal WordArray which holds the bytes as an array of words (32 bit integers). We can view the ArrayBuffer as an array of unsigned 8 bit integers and put them together to build the WordArray (see arrayBufferToWordArray).

The following code shows a full example:

function arrayBufferToWordArray(ab) {
  var i8a = new Uint8Array(ab);
  var a = [];
  for (var i = 0; i < i8a.length; i += 4) {
    a.push(i8a[i] << 24 | i8a[i + 1] << 16 | i8a[i + 2] << 8 | i8a[i + 3]);
  }
  return CryptoJS.lib.WordArray.create(a, i8a.length);
}

function handleFileSelect(evt) {
  var files = evt.target.files; // FileList object

  // Loop through the FileList and render image files as thumbnails.
  for (var i = 0, f; f = files[i]; i++) {
    var reader = new FileReader();

    // Closure to capture the file information.
    reader.onloadend = (function(theFile) {
      return function(e) {
        var arrayBuffer = e.target.result;

        var hash = CryptoJS.SHA256(arrayBufferToWordArray(arrayBuffer));
        var elem = document.getElementById("hashValue");
        elem.value = hash;
      };

    })(f);
    reader.onerror = function(e) {
      console.error(e);
    };

    // Read in the image file as a data URL.
    reader.readAsArrayBuffer(f);
  }
}

document.getElementById('upload').addEventListener('change', handleFileSelect, false);
<script src="https://cdn.rawgit.com/CryptoStore/crypto-js/3.1.2/build/rollups/sha256.js"></script>
<form method="post" enctype="multipart/form-data">
  Select image to upload:
  <input type="file" name="upload" id="upload">
  <input type="text" name="hashValue" id="hashValue">
</form>

You can extend this code with the techniques in my other answer in order to hash files of arbitrary size without freezing the browser.

Artjom B.
  • 61,146
  • 24
  • 125
  • 222
  • thanks! Your answer gave me more than I had expected. just to let you know, the 'undefined' hash value that I was getting was due to weird reason of having two other references of ` ` I took them out, and undefined issue was gone. thanks guys! – learnedOne Nov 25 '15 at 14:12
  • 1
    `arrayBufferToWordArray` did the magic for me. thanks so much! – learnedOne Nov 25 '15 at 14:40
  • slap a .toString() on the var hash to get an actual string hash output! – Micheal C Wallas Aug 15 '19 at 19:31
  • @MichealCWallas Do you mean that line `elem.value = hash;` should be changed to `elem.value = hash.toString();`? That shouldn't hurt, but it also shouldn't be necessary, because assigning a WordArray object to a string property should result in automatic stringification. I've tested this with Firefox and Vivaldi and didn't see an issue. Maybe this is a bug in the browser you're using. – Artjom B. Aug 16 '19 at 18:55
  • @ArtjomB. sorry I meant on the final output, `var hash = CryptoJS.SHA256(arrayBufferToWordArray(arrayBuffer)).toString()`. I thought the code wasn't working but the toString() was all it took to get a string hash output :) – Micheal C Wallas Aug 26 '19 at 02:06
  • That's the same value that I meant. Did you try to print it like this `console.log(hash)`? If not which browser did you use? – Artjom B. Aug 26 '19 at 18:47
  • Big files this code is crashing the browser. can any one help for this? – Sharad Feb 03 '20 at 07:30
  • @Sharad Did you read my answer to the end? There is a link to a link to probably working code. – Artjom B. Feb 05 '20 at 19:17