how to correctly convert pdf file to base64 in browser?

Question

I have three failing versions of the following code in a chrome extension, which attempts to intercept a click to a link pointing to a pdf file, fetch that file, convert it to base64, and then log it. But I'm afraid I don't really know anything about binary formats and encodings, so I'm royally sucking this up.

var links = document.getElementsByTagName("a");

function transform(blob) {
    return btoa(String.fromCharCode.apply(null, new Uint8Array(blob)));
};

function getlink(link) {
    var x = new XMLHttpRequest();
    x.open("GET", link, true);
    x.responseType = 'blob';
    x.onload = function(e) {
        console.log("Raw response:");
        console.log(x.response);
        console.log("Direct transformation:");
        console.log(btoa(x.response));
        console.log("Mysterious thing I got from SO:");
        console.log(transform(x.response));
        window.location.href = link;
    };

    x.onerror = function (e) {
        console.error(x.statusText);
    };

    x.send(null);
};

for (i = 0, len = links.length; i < len; i++) {
    var l = links[i]
    l.addEventListener("click", function(e) {
        e.preventDefault();
        e.stopPropagation();
        e.stopImmediatePropagation();
        getlink(this.href);
    }, false);
};

Version 1 doesn't have the call to x.responseType, or the call to transform. It was my original, naive, implementation. It threw an error: "The string to be encoded contains characters outside of the Latin1 range."

After googling that error, I found this prior SO, which suggests that in parsing an image:

The response type needs to be set to blob. So this code does that.
There's some weird line, I don't know what it does at all: String.fromCharCode.apply(null, new Uint8Array(blob)).

Because I know nothing about binary formats, I guessed, probably stupidly, that making a PDF base64 would be the same as making some random image format base64. So, in fine SO tradition, I copied code that I don't really understand. In stages.

Version 2 of the code just set the response type to blob but didn't try the second transformation. And the code worked, and logged something that looked like a base64 string, but a clearly incorrect string. In its entirety, it logged:

W29iamVjdCBCbG9iXQ==

Which is just goofily wrong. It's obviously too short for a 46k pdf file, and a reference base64 encoding I created with python from the commandline was much much much longer, as one would expect.

Version 3 of the code then also applies the mysterious transformation using stringFromCharCode and all the rest, which I shoved into the transform function.

However, that doesn't log anything at all---a blank line appears in the console in its appropriate place. No errors, no nonsense output, just a blank line.

I know I'm getting the correct file from prior testing. Also, the call to log the raw response object produces Blob {size: 45587, type: "application/pdf"}, which is the correct filesize for the pdf I'm experimenting with, so the blob actually contains what it should when it gets into the browser.

I'm using, and only need to support, a current version of chrome.

Can someone tell me what I'm doing wrong?

Thanks!

score 3 · Answer 1 · answered Jul 13 '16 at 04:08

3

If you only need to support modern browsers, you should also be able to use FileReader#readAsDataURL.

That would let you do something like this:

var reader  = new FileReader();
reader.addEventListener("load", function () {
  console.log(reader.result);
}, false);
// The function accepts Blobs and Files
reader.readAsDataURL(x.response);

This logs a data URI, which will contain your base64 data.

answered Jul 13 '16 at 04:08

Frank Tan

4,234
2
19
29

Woah! That's cool---I might have to take back the rant against JS that I just wrote in the comments to the demo version of this extension. Maybe. https://github.com/paultopia/scrape-pdf/commit/5b47232893ddbf19745e7a825135a306b8d5355e – Paul Gowder Jul 13 '16 at 04:29
@PaulGowder You made me chuckle. We've all been there. Feel free to let me know if this doesn't work for you and we'll see what else we can do. – Frank Tan Jul 13 '16 at 11:24

score -1 · Answer 2 · answered Jul 13 '16 at 01:38

-1

I think I've found my own solution. The response type needs to be arraybuffer not blob.

answered Jul 13 '16 at 01:38

Paul Gowder

2,409
1
21
36

how to correctly convert pdf file to base64 in browser?

2 Answers2