6

A Python backend reads a binary file, base64 encodes it, inserts it into a JSON doc and sends it to a JavaScript frontend:

#Python
with open('some_binary_file', 'rb') as in_file:
    return base64.b64encode(in_file.read()).decode('utf-8')

The JavaScript frontend fetches the base64 encoded string from the JSON document and turns it into a binary blob:

#JavaScript
b64_string = response['b64_string'];
decoded_file = atob(b64_string);
blob = new Blob([decoded_file], {type: 'application/octet-stream'});

Unfortunately when downloading the blob the encoding seems to be wrong but I'm not sure where the problem is. E.g. it is an Excel file that I can't open anymore. In the Python part I've tried different decoders ('ascii', 'latin1') but that doesn't make a difference. Is there a problem with my code?

Berco Beute
  • 1,115
  • 15
  • 30
  • This code looks fine, what do you mean by *seems to be wrong*? – Tamas Hegedus Feb 16 '16 at 21:31
  • As I added I can't open the resulting file (in this case an Excel file). – Berco Beute Feb 16 '16 at 22:00
  • what are the contents of the file when opened by some viewer? did thr size of the file shrink? – Tamas Hegedus Feb 16 '16 at 23:06
  • The file actually got bigger, from 9kb to 11 kb. The file content seems to be the correct content with all kinds of gibberish interspersed. I can read a lot of the file, but there are tons of weird characters in there as well. – Berco Beute Feb 17 '16 at 16:00
  • Can you open it with a notepad and see if base64 text is in there? I suspect you either do not decode the data or encode it twice, as the size growed as much as it would if you just base64 encoded the file before saving. – Tamas Hegedus Feb 17 '16 at 16:54
  • The base64 encoded string (b64_string in the example) looks like this (the beginning). This is the same string on the Python side: 0M8R4KGxGuEAAAAAAAAAAAAAAAAAAAAAOwADAP7/CQAGAAAAAAAAAAAAAAABAAAADgAAAAAAAAAAEAAACwA... – Berco Beute Feb 17 '16 at 17:11
  • That's certainly base64 (with a bunch of zeroes, seen from the AAAAAA part). That means you either b64encode it twice and decode it only once, or encode it only once, and you don't decode it. Either way, if you showed us the full code we would know that much earlier – Tamas Hegedus Feb 17 '16 at 17:13
  • The complete source is a bit convoluted, but there is really nothing else going on than the two snippets in my question. Since the base64 string the same on both the python and javascript side, which side do you think the error could be? – Berco Beute Feb 17 '16 at 17:19
  • I really don't know.. I would start with the javascript aprt, as it's much easier to make errors in JS – Tamas Hegedus Feb 17 '16 at 17:22

1 Answers1

1

I found the answer here. The problem was at the JavaScript side. It seems only applying atob to the base64 encoded string doesn't work for binary data. You'll have to convert that into a typed byte array. What I ended up doing (LiveScript):

byte_chars = atob base64_str
byte_numbers = [byte_chars.charCodeAt(index) for bc, index in byte_chars]
byte_array = new Uint8Array byte_numbers
blob = new Blob [byte_array], {type: 'application/octet-stream'}
Community
  • 1
  • 1
Berco Beute
  • 1,115
  • 15
  • 30
  • https://developer.mozilla.org/en-US/docs/Web/API/Blob/Blob Blobs should accept strings as blob parts too, so you don't have to convert it to bytes. Like `new Blob([atob("AAAA")], {type:"application/octet-stream"}).size === 3` – Tamas Hegedus Feb 17 '16 at 23:49
  • What browser are you using? I just ran that code in chrome, and works flawlessly – Tamas Hegedus Feb 18 '16 at 13:36
  • Ok, it only works in the simplest case. `new Blob([atob("Q9q3")], {type:"application/octet-stream"}).size === 3` does fail indeed, as it gets encoded as utf8 – Tamas Hegedus Feb 18 '16 at 14:07