201

I need to convert a base64 encode string into an ArrayBuffer. The base64 strings are user input, they will be copy and pasted from an email, so they're not there when the page is loaded. I would like to do this in javascript without making an ajax call to the server if possible.

I found those links interesting, but they didt'n help me:

ArrayBuffer to base64 encoded string

this is about the opposite conversion, from ArrayBuffer to base64, not the other way round

http://jsperf.com/json-vs-base64/2

this looks good but i can't figure out how to use the code.

Is there an easy (maybe native) way to do the conversion? thanks

Kamil Kiełczewski
  • 85,173
  • 29
  • 368
  • 345
Tony
  • 2,043
  • 2
  • 12
  • 5

10 Answers10

254
function base64ToArrayBuffer(base64) {
    var binaryString = atob(base64);
    var bytes = new Uint8Array(binaryString.length);
    for (var i = 0; i < binaryString.length; i++) {
        bytes[i] = binaryString.charCodeAt(i);
    }
    return bytes.buffer;
}
Boris Verkhovskiy
  • 14,854
  • 11
  • 100
  • 103
Goran.it
  • 5,991
  • 2
  • 23
  • 25
  • 8
    Please explain me what is really happening here. – Govinda Sakhare Jun 16 '16 at 06:59
  • 7
    Well it's pretty straightforward, first we decode the base64 string (atob), then we create new array of 8-bit unsigned integers with the same length as the decoded string. After that we iterate the string and populate the array with Unicode value of each character in the string. – Goran.it Jun 17 '16 at 11:13
  • 1
    Why unsigned 8-bit? any specific reason? – Govinda Sakhare Jun 17 '16 at 11:36
  • 3
    From MDN : Base64 is a group of similar binary-to-text encoding schemes that represent binary data in an ASCII string format by translating it into a radix-64 representation. The Uint8Array typed array represents an array of 8-bit unsigned integers, and we are working with ASCII representation of the data (which is also an 8-bit table).. – Goran.it Jun 17 '16 at 12:57
  • 7
    This is not correct. It allows javascript to interpret bytes as string, that affects data which is actually true binary. – Tomáš Zato Mar 24 '18 at 23:42
  • 12
    the problem is that a) not every byte sequence is valid unicode b) not every character in unicode is one byte so `bytes[i] = binary_string.charCodeAt(i);` can be wrong – mixture Aug 29 '18 at 10:41
  • In my test for image base64, it returned length of 2947 (chars!!), when saved into byte array it became 4436 bytes, when the original image size is 3160 bytes!! - How come?? Very simple: Some of the "chars" produced by atob() became ABOVE 255 barrier, and as UTF-8 encoding suggests, it interpreted the binary data as UTF-8 string (or maybe UTF-16, I am not really sure).... Sorry, but the answer is NOT correct, or perhaps missing something.... – Xerix Jun 16 '21 at 02:40
  • 1
    Each Base64 digit represents exactly 6 bits of data. ... This means that the Base64 version of a string or file will be at most 133% the size of its source (a ~33% increase). The increase may be larger if the encoded data is small. In the same time, atob cannot produce chars above 255 barrier by design, you are free to look at the documentation. When you use atob / btoa, there is no UTF-8 interpretation as its name suggest its AsciiToBinnary and vice verca. (BTW. I wrongly stated that ASCII is a 8-bit table, its actually a 7 bit character set). – Goran.it Jun 16 '21 at 05:28
  • 3
    This answer is correct, I've tested all the possible values. The function `window.atob` decodes the input and then writes each decoded byte in an UTF-16 character which is 2 bytes. There's no loss possible since a byte only goes up to 255. Note that if the encoded content is UTF-8 text, you'll still have to decode it: `new TextDecoder("utf-8").decode(_base64ToArrayBuffer("4oKs"));`. – Florent B. Jun 24 '21 at 03:29
  • 1
    `atob()` "[returns a string consisting of characters in the range U+0000 to U+00FF](https://html.spec.whatwg.org/multipage/webappapis.html#dom-atob-dev)" so `binary_string.charCodeAt(i)` will always return 0-255, so this code works with arbitrary binary data encoded as base64, not just "valid Unicode". – Boris Verkhovskiy Apr 25 '23 at 22:38
147

Using TypedArray.from:

Uint8Array.from(atob(base64_string), c => c.charCodeAt(0))

Performance to be compared with the for loop version of Goran.it answer.

supersam654
  • 3,126
  • 33
  • 34
ofavre
  • 4,488
  • 2
  • 26
  • 23
  • 3
    To who likes this kind of one liner, keep in mind that `Uint8Array.from` still has few compatibility with some browsers . – IzumiSy Feb 17 '17 at 06:02
  • 9
    Please do not recommend atob or btoa: https://developer.mozilla.org/en-US/docs/Web/API/WindowBase64/Base64_encoding_and_decoding – Kugel Aug 28 '17 at 05:13
  • rails compiler can't handle this string and fails with `ExecJS::RuntimeError: SyntaxError: Unexpected token: operator (>)`; (rails 5) – Avael Kross Sep 19 '17 at 01:38
  • Get rid of the arrow function syntax by using this instead: `Uint8Array.from(atob(base64_string), function(c) {return c.charCodeAt(0);})`. – ofavre Sep 25 '17 at 08:05
  • 7
    This isn't an array buffer. This is the typed array. You get access to the array buffer through the `.buffer` property of what is returned from `Uint8Array` – oligofren Feb 07 '18 at 19:22
  • @Kugel, do you remember why you suggested to not use `atob` or `btoa`? The solutions given there seem to make use of those functions. In particular, Solution #3 on that page is essentially this solution + the final conversion to UTF-16. – Saites Oct 07 '19 at 22:28
  • 14
    @Saites, There's nothing wrong with `atob` or `btoa`, you just have to give them valid input. `atob` needs a valid base64 string, otherwise it will throw an error. And `btoa` needs a valid byte string (also called a binary string) which is a string containing characters in the range 0-255. If your string has characters outside that range, `btoa` will throw an error. – GetFree Nov 16 '19 at 10:13
  • 1
    My benchmark shows that this is about 4 times slower than the for loop version. https://jsben.ch/p3Cbs – user202729 Jan 02 '21 at 13:22
  • 1
    The problem with `atob` *alone* is that it will mangle unicode characters. For example `atob('8J+ZiA==')` returns 'ð\x9F\x99\x88' which you have to unmangle to get the proper '' UTF8 string. *But* calling c.charCodeAt(0)` on every character is fine, and you can safely call `new TextDecoder.decode(uint8array)` and get the right UTF8 string. – ShortFuse Jun 19 '22 at 19:36
  • 1
    @ShortFuse Right. I think the way to remember this is that `atob` stands for "ASCII to binary". Wherein ASCII=base64 and binary=bytes. So indeed it gives you bytes that you have to decode if you don't want bytes. The question is why isn't there an option in `atob` to produce the `Uint8Array` directly, since that's how you would use it most of the time. This is answer is a great approximation and seems well-supported in browsers and node. – personal_cloud Dec 28 '22 at 17:58
  • @user202729 that's because this answer builds an intermediate array of numbers (floats) so we're going base64 → string → array of floats → Uint8Array whereas [the for loop answer](https://stackoverflow.com/a/21797381) skips creating the array of floats so it goes base64 → string → Uint8Array. As you show in your benchmark, we can even go twice as fast on top of that if we [re-implement](https://github.com/niklasvh/base64-arraybuffer/blob/master/src/index.ts#L31) `atob()` to decode into a Uint8Array directly so then we just have base64 → Uint8Array. – Boris Verkhovskiy Apr 25 '23 at 22:57
56

For Node.js users:

const myBuffer = Buffer.from(someBase64String, 'base64');

myBuffer will be of type Buffer which is a subclass of Uint8Array. Unfortunately, Uint8Array is NOT an ArrayBuffer as the OP was asking for. But when manipulating an ArrayBuffer I almost always wrap it with Uint8Array or something similar, so it should be close to what's being asked for.

DoomGoober
  • 1,533
  • 14
  • 20
  • 8
    This actually doesn't seem to produce `Uint8Array`, as the code that expected that barfed when passed result of this call. However, `Uint8Array.from(Buffer.from(someBase64String, 'base64'))` works great to produce `Uint8Array` typed value. – LB2 Sep 01 '22 at 21:34
46

Goran.it's answer does not work because of unicode problem in javascript - https://developer.mozilla.org/en-US/docs/Web/API/WindowBase64/Base64_encoding_and_decoding.

I ended up using the function given on Daniel Guerrero's blog: http://blog.danguer.com/2011/10/24/base64-binary-decoding-in-javascript/

Function is listed on github link: https://github.com/danguer/blog-examples/blob/master/js/base64-binary.js

Use these lines

var uintArray = Base64Binary.decode(base64_string);  
var byteArray = Base64Binary.decodeArrayBuffer(base64_string); 
Yaan
  • 561
  • 1
  • 5
  • 7
  • 1
    This method is 2x faster than using atob. – xiaoyu2er Dec 22 '17 at 07:28
  • 6
    Can you give an example for which it wouldn't work? The article talks about encoding arbitrary strings, which might contain unicode characters, but does not apply to `atob` at all. – riv Jul 04 '18 at 13:19
  • 2
    `decodeArrayBuffer` returns an `ArrayBuffer` that has size always divisible by 3, which I don't understand if it is by design or a bug. I will ask in the github project. – ceztko Sep 19 '18 at 13:43
  • @ceztko It's probably by (accidental) design. The base64 encoding algorithm takes groups of 3 bytes and turns them into 4 characters. The decode method probably allocates an ArrayBuffer whose length is base64String.length/4*3 bytes and never truncates any unused bytes when finished. – AlwaysLearning Nov 08 '19 at 04:16
  • 3
    @AlwaysLearning which means it's probably bugged since leftover zero bytes may corrupt intended output content. – ceztko Nov 10 '19 at 13:54
  • It appears that this method is the slowest according to my benchmark -- https://jsben.ch/aS6YU – user202729 Jan 02 '21 at 13:28
  • @ceztko yes! I saw the same thing. After using it in a download, my file was sometimes corrupted with extra characters. I fixed it with this in the decodeArrayBuffer function: if (input[input.length - 1] === "=") { bytes--; if (input[input.length - 2] === "=") { bytes--; } } – JKH Feb 24 '21 at 21:56
26

Async solution, it's better when the data is big:

// base64 to buffer
function base64ToBufferAsync(base64) {
  var dataUrl = "data:application/octet-binary;base64," + base64;

  fetch(dataUrl)
    .then(res => res.arrayBuffer())
    .then(buffer => {
      console.log("base64 to buffer: " + new Uint8Array(buffer));
    })
}

// buffer to base64
function bufferToBase64Async( buffer ) {
    var blob = new Blob([buffer], {type:'application/octet-binary'});    
    console.log("buffer to blob:" + blob)

    var fileReader = new FileReader();
    fileReader.onload = function() {
      var dataUrl = fileReader.result;
      console.log("blob to dataUrl: " + dataUrl);

      var base64 = dataUrl.substr(dataUrl.indexOf(',')+1)      
      console.log("dataUrl to base64: " + base64);
    };
    fileReader.readAsDataURL(blob);
}
张浩然
  • 371
  • 3
  • 3
13

Javascript is a fine development environment so it seems odd than it doesn't provide a solution to this small problem. The solutions offered elsewhere on this page are potentially slow. Here is my solution. It employs the inbuilt functionality that decodes base64 image and sound data urls.

var req = new XMLHttpRequest;
req.open('GET', "data:application/octet;base64," + base64Data);
req.responseType = 'arraybuffer';
req.onload = function fileLoaded(e)
{
   var byteArray = new Uint8Array(e.target.response);
   // var shortArray = new Int16Array(e.target.response);
   // var unsignedShortArray = new Int16Array(e.target.response);
   // etc.
}
req.send();

The send request fails if the base 64 string is badly formed.

The mime type (application/octet) is probably unnecessary.

Tested in chrome. Should work in other browsers.

Eli Grey
  • 35,104
  • 14
  • 75
  • 93
dinosaurclover
  • 139
  • 1
  • 2
  • 1
    This was the perfect solution for me, simple and clean. I quickly tested it in Firefox, IE 11, Edge and worked fine! – cs-NET Aug 27 '18 at 16:22
  • I'm not sure how it works for you in IE11, but I get an `Access Denied` error, which seems to be a CORS limitation. – Sergiu Apr 12 '19 at 09:10
  • 5
    This can be written more succinctly as `await (await fetch("data:application/octet;base64," + base64data)).arrayBuffer()` with async/await and the Fetch API. – Jordan Mann Nov 30 '21 at 07:24
  • Perfect! I am developing an Angular app and was reluctant to use the Node Buffer due to performance/optimization issues. The simplified solution from Jordan Mann above works great! Thank you! – Nalin Jayasuriya Feb 02 '23 at 11:44
10

Pure JS - no string middlestep (no atob)

I write following function which convert base64 in direct way (without conversion to string at the middlestep). IDEA

  • get 4 base64 characters chunk
  • find index of each character in base64 alphabet
  • convert index to 6-bit number (binary string)
  • join four 6 bit numbers which gives 24-bit numer (stored as binary string)
  • split 24-bit string to three 8-bit and covert each to number and store them in output array
  • corner case: if input base64 string ends with one/two = char, remove one/two numbers from output array

Below solution allows to process large input base64 strings. Similar function for convert bytes to base64 without btoa is HERE

function base64ToBytesArr(str) {
  const abc = [..."ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"]; // base64 alphabet
  let result = [];

  for(let i=0; i<str.length/4; i++) {
    let chunk = [...str.slice(4*i,4*i+4)]
    let bin = chunk.map(x=> abc.indexOf(x).toString(2).padStart(6,0)).join(''); 
    let bytes = bin.match(/.{1,8}/g).map(x=> +('0b'+x));
    result.push(...bytes.slice(0,3 - (str[4*i+2]=="=") - (str[4*i+3]=="=")));
  }
  return result;
}


// --------
// TEST
// --------


let test = "Alice's Adventure in Wonderland.";  

console.log('test string:', test.length, test);
let b64_btoa = btoa(test);
console.log('encoded string:', b64_btoa);

let decodedBytes = base64ToBytesArr(b64_btoa); // decode base64 to array of bytes
console.log('decoded bytes:', JSON.stringify(decodedBytes));
let decodedTest = decodedBytes.map(b => String.fromCharCode(b) ).join``;
console.log('Uint8Array', JSON.stringify(new Uint8Array(decodedBytes)));
console.log('decoded string:', decodedTest.length, decodedTest);

Caution!

If you want to decode base64 to STRING (not bytes array) and you know that result contains utf8 characters then atob will fail in general e.g. for character the atob("8J+SqQ==") will give wrong result . In this case you can use above solution and convert result bytes array to string in proper way e.g. :

function base64ToBytesArr(str) {
  const abc = [..."ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"]; // base64 alphabet
  let result = [];

  for(let i=0; i<str.length/4; i++) {
    let chunk = [...str.slice(4*i,4*i+4)]
    let bin = chunk.map(x=> abc.indexOf(x).toString(2).padStart(6,0)).join(''); 
    let bytes = bin.match(/.{1,8}/g).map(x=> +('0b'+x));
    result.push(...bytes.slice(0,3 - (str[4*i+2]=="=") - (str[4*i+3]=="=")));
  }
  return result;
}


// --------
// TEST
// --------


let testB64 = "8J+SqQ==";   // for string: "";  
console.log('input base64            :', testB64);

let decodedBytes = base64ToBytesArr(testB64); // decode base64 to array of bytes
console.log('decoded bytes           :', JSON.stringify(decodedBytes));

let result = new TextDecoder("utf-8").decode(new Uint8Array(decodedBytes));
console.log('properly decoded string :', result);

let result_atob = atob(testB64);
console.log('decoded by atob         :', result_atob);

Snippets tested 2022-08-04 on: chrome 103.0.5060.134 (arm64), safari 15.2, firefox 103.0.1 (64 bit), edge 103.0.1264.77 (arm64), and node-js v12.16.1

Kamil Kiełczewski
  • 85,173
  • 29
  • 368
  • 345
2

I would strongly suggest using an npm package implementing correctly the base64 specification.

The best one I know is rfc4648

The problem is that btoa and atob use binary strings instead of Uint8Array and trying to convert to and from it is cumbersome. Also there is a lot of bad packages in npm for that. I lose a lot of time before finding that one.

The creators of that specific package did a simple thing: they took the specification of Base64 (which is here by the way) and implemented it correctly from the beginning to the end. (Including other formats in the specification that are also useful like Base64-url, Base32, etc ...) That doesn't seem a lot but apparently that was too much to ask to the bunch of other libraries.

So yeah, I know I'm doing a bit of proselytism but if you want to avoid losing your time too just use rfc4648.

Community
  • 1
  • 1
nicolas-van
  • 935
  • 8
  • 13
2

I used the accepted answer to this question to create base64Url string <-> arrayBuffer conversions in the realm of base64Url data transmitted via ASCII-cookie [atob, btoa are base64[with +/]<->js binary string], so I decided to post the code.

Many of us may want both conversions and client-server communication may use the base64Url version (though a cookie may contain +/ as well as -_ characters if I understand well, only ",;\ characters and some wicked characters from the 128 ASCII are disallowed). But a url cannot contain / character, hence the wider use of b64 url version which of course not what atob-btoa supports...

Seeing other comments, I would like to stress that my use case here is base64Url data transmission via url/cookie and trying to use this crypto data with the js crypto api (2017) hence the need for ArrayBuffer representation and b64u <-> arrBuff conversions... if array buffers represent other than base64 (part of ascii) this conversion wont work since atob, btoa is limited to ascii(128). Check out an appropriate converter like below:

The buff -> b64u version is from a tweet from Mathias Bynens, thanks for that one (too)! He also wrote a base64 encoder/decoder: https://github.com/mathiasbynens/base64

Coming from java, it may help when trying to understand the code that java byte[] is practically js Int8Array (signed int) but we use here the unsigned version Uint8Array since js conversions work with them. They are both 256bit, so we call it byte[] in js now...

The code is from a module class, that is why static.

//utility

/**
 * Array buffer to base64Url string
 * - arrBuff->byte[]->biStr->b64->b64u
 * @param arrayBuffer
 * @returns {string}
 * @private
 */
static _arrayBufferToBase64Url(arrayBuffer) {
    console.log('base64Url from array buffer:', arrayBuffer);

    let base64Url = window.btoa(String.fromCodePoint(...new Uint8Array(arrayBuffer)));
    base64Url = base64Url.replaceAll('+', '-');
    base64Url = base64Url.replaceAll('/', '_');

    console.log('base64Url:', base64Url);
    return base64Url;
}

/**
 * Base64Url string to array buffer
 * - b64u->b64->biStr->byte[]->arrBuff
 * @param base64Url
 * @returns {ArrayBufferLike}
 * @private
 */
static _base64UrlToArrayBuffer(base64Url) {
    console.log('array buffer from base64Url:', base64Url);

    let base64 = base64Url.replaceAll('-', '+');
    base64 = base64.replaceAll('_', '/');
    const binaryString = window.atob(base64);
    const length = binaryString.length;
    const bytes = new Uint8Array(length);
    for (let i = 0; i < length; i++) {
        bytes[i] = binaryString.charCodeAt(i);
    }

    console.log('array buffer:', bytes.buffer);
    return bytes.buffer;
}
r j
  • 186
  • 8
0

Solution without atob

I've seen many people complaining about using atob and btoa in the replies. There are some issues to take into account when using them.

There's a solution without using them in the MDN page about Base64. Below you can find the code to convert a base64 string into a Uint8Array copied from the docs.

Note that the function below returns a Uint8Array. To get the ArrayBuffer version you just need to do uintArray.buffer.

function b64ToUint6(nChr) {
  return nChr > 64 && nChr < 91
    ? nChr - 65
    : nChr > 96 && nChr < 123
    ? nChr - 71
    : nChr > 47 && nChr < 58
    ? nChr + 4
    : nChr === 43
    ? 62
    : nChr === 47
    ? 63
    : 0;
}

function base64DecToArr(sBase64, nBlocksSize) {
  const sB64Enc = sBase64.replace(/[^A-Za-z0-9+/]/g, "");
  const nInLen = sB64Enc.length;
  const nOutLen = nBlocksSize
    ? Math.ceil(((nInLen * 3 + 1) >> 2) / nBlocksSize) * nBlocksSize
    : (nInLen * 3 + 1) >> 2;
  const taBytes = new Uint8Array(nOutLen);

  let nMod3;
  let nMod4;
  let nUint24 = 0;
  let nOutIdx = 0;
  for (let nInIdx = 0; nInIdx < nInLen; nInIdx++) {
    nMod4 = nInIdx & 3;
    nUint24 |= b64ToUint6(sB64Enc.charCodeAt(nInIdx)) << (6 * (3 - nMod4));
    if (nMod4 === 3 || nInLen - nInIdx === 1) {
      nMod3 = 0;
      while (nMod3 < 3 && nOutIdx < nOutLen) {
        taBytes[nOutIdx] = (nUint24 >>> ((16 >>> nMod3) & 24)) & 255;
        nMod3++;
        nOutIdx++;
      }
      nUint24 = 0;
    }
  }

  return taBytes;
}

If you're interested in the reverse operation, ArrayBuffer to base64, you can find how to do it in the same link.

Daniel Reina
  • 5,764
  • 1
  • 37
  • 50