20

I have been playing with a few JS encryption libraries (CryptoJS, SJCL) and discovered problems related to the Blob/File APIs and JavaScript "binary strings".

I realized that the encryption isn't even really relevant, so here's a much simplified scenario. Simply read a file in using readAsBinaryString and then create a Blob:

>>> reader.result
"GIF89a����ÿÿÿÿÿÿ!þCreated with GIMP�,�������D�;"
>>> reader.result.length
56
>>> typeof reader.result
"string"
>>> blob = new Blob([reader.result], {type: "image/gif"})
Blob { size=64, type="image/gif", constructor=function(), more...}

I have created a JSFiddle that will basically do the above: it simply reads any arbitrary file, creates a blob from it, and outputs the length vs size: http://jsfiddle.net/6L82t/1/

It appears that, when creating the Blob from the "binary (javascript) string", something with character encoding ends up munging the result.

If a non-binary file is used, you will see that the lengths of the Blob and the original binary string are identical.

So there is something that happens when trying to create a Blob/File from a non-plaintext Javascript string, and I need whatever that is to not happen. I think it may have something to do with the fact that JS strings are UTF-16?

There's a (maybe) related thread here: HTML5 File API read as text and binary

Do I need to possibly take the decrypted results (UTF-16) and "convert" them to UTF-8 before putting them in a Blob/File?

Working with someone in #html5 on Freenode, we determined that if you read an ArrayBuffer directly and then create the blob from that by first using a Uint8Array, the bytes work out just fine. You can see a fiddle that essentially does that here: http://jsfiddle.net/GH7pS/4/

The issue is, at least in my scenario, I am going to end up with a binary string and would like to figure out how to directly convert that into a Blob so that I can then use html5's download to allow the user to click to download the blob directly.

Thanks!

Community
  • 1
  • 1
Erik Jacobs
  • 841
  • 3
  • 7
  • 19
  • 1
    You do realize that your post [already has a comprehensive edit history](http://stackoverflow.com/posts/23795034/revisions), right? You can get the exact time stamp of any time posted on an SE web page by hovering over it; try that with the "asked 2 hours ago" above your name. – Robert Harvey May 22 '14 at 01:31

1 Answers1

26

It appears that, when creating the Blob from the "binary (javascript) string", something with character encoding ends up munging the result.

Yes. That post you read explains well how a "binary string" is constituted.

The Blob constructor in contrast does

  1. Let s be the result of converting [the string] to a sequence of Unicode characters using the algorithm for doing so in WebIDL.
  2. Encode s as UTF-8 and append the resulting bytes to [the blob].

We determined that if you read an ArrayBuffer directly and then create the blob from that by first using a Uint8Array, the bytes work out just fine.

Yes, that's how it is supposed to work. Just do the encryption on a Typed Array where you deal with the bytes individually, not on some string.

The issue is, at least in my scenario, I am going to end up with a binary string

Again: Try not to. binary strings are deprecated.

I would like to figure out how to directly convert a binary string into a Blob. Do I need to possibly take the decrypted results (UTF-16) and "convert" them to UTF-8 before putting them in a Blob/File?

No, better don't try to do any string conversions. Instead, construct a Uint8Array(Uint8Array) for the bytes that you want to get from the binary string.

This should do it (untested):

var bytes = new Uint8Array(str.length);
for (var i=0; i<str.length; i++)
    bytes[i] = str.charCodeAt(i);
Community
  • 1
  • 1
Bergi
  • 630,263
  • 148
  • 957
  • 1,375
  • That totally worked. I guess the follow-up question pertains specifically to CryptoJS -- when you decrypt the contents and receive your "WordArray", is there a way to go directly to an ArrayBuffer without first converting to a string and then putting that in a Uint8Array to use for the blob? – Erik Jacobs May 22 '14 at 16:49
  • 1
    Follow-on: It looks like someone did something that doesn't involve a string conversion first (operates on bytes): https://groups.google.com/d/msg/crypto-js/TOb92tcJlU0/Eq7VZ5tpi-QJ Seems like this works. Awesome! – Erik Jacobs May 22 '14 at 17:12
  • Yeah, that might be a good separate question (tagged [tag:cryptojs]). I didn't know they have their own datatypes, I'd have to look such things up in their documentation. – Bergi May 22 '14 at 17:32
  • @ErikJacobs Thanks for posting the link! – Mati Nov 23 '14 at 02:18
  • Thanks for the conversion to byte array from binary string! – Gutsygibbon Jul 14 '21 at 16:37