242

The error in the title is thrown only in Google Chrome, according to my tests. I'm base64 encoding a big XML file so that it can be downloaded:

this.loader.src = "data:application/x-forcedownload;base64,"+
                  btoa("<?xml version=\"1.0\" encoding=\"utf-8\"?>"
                  +"<"+this.gamesave.tagName+">"
                  +this.xml.firstChild.innerHTML
                  +"</"+this.gamesave.tagName+">");

this.loader is hidden iframe.

This error is actually quite a change because normally, Google Chrome would crash upon btoa call. Mozilla Firefox has no problems here, so the issue is browser related. I'm not aware of any strange characters in file. Actually I do believe there are no non-ascii characters.

Q: How do I find the problematic characters and replace them so that Chrome stops complaining?

I have tried to use Downloadify to initiate the download, but it does not work. It's unreliable and throws no errors to allow debug.

Tomáš Zato
  • 50,171
  • 52
  • 268
  • 778
  • This is an old question but I ran into this issue recently and found this MDN article very useful: https://developer.mozilla.org/en-US/docs/Glossary/Base64#the_unicode_problem – Alex Apr 08 '23 at 23:19

9 Answers9

365

If you have UTF8, use this (actually works with SVG source), like:

btoa(unescape(encodeURIComponent(str)))

example:

 var imgsrc = 'data:image/svg+xml;base64,' + btoa(unescape(encodeURIComponent(markup)));
 var img = new Image(1, 1); // width, height values are optional params 
 img.src = imgsrc;

If you need to decode that base64, use this:

var str2 = decodeURIComponent(escape(window.atob(b64)));
console.log(str2);

Example:

var str = "äöüÄÖÜçéèñ";
var b64 = window.btoa(unescape(encodeURIComponent(str)))
console.log(b64);

var str2 = decodeURIComponent(escape(window.atob(b64)));
console.log(str2);

Note: if you need to get this to work in mobile-safari, you might need to strip all the white-space from the base64 data...

function b64_to_utf8( str ) {
    str = str.replace(/\s/g, '');    
    return decodeURIComponent(escape(window.atob( str )));
}

2017 Update

This problem has been bugging me again.
The simple truth is, atob doesn't really handle UTF8-strings - it's ASCII only.
Also, I wouldn't use bloatware like js-base64.
But webtoolkit does have a small, nice and very maintainable implementation:

/**
*
*  Base64 encode / decode
*  http://www.webtoolkit.info
*
**/
var Base64 = {

    // private property
    _keyStr: "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/="

    // public method for encoding
    , encode: function (input)
    {
        var output = "";
        var chr1, chr2, chr3, enc1, enc2, enc3, enc4;
        var i = 0;

        input = Base64._utf8_encode(input);

        while (i < input.length)
        {
            chr1 = input.charCodeAt(i++);
            chr2 = input.charCodeAt(i++);
            chr3 = input.charCodeAt(i++);

            enc1 = chr1 >> 2;
            enc2 = ((chr1 & 3) << 4) | (chr2 >> 4);
            enc3 = ((chr2 & 15) << 2) | (chr3 >> 6);
            enc4 = chr3 & 63;

            if (isNaN(chr2))
            {
                enc3 = enc4 = 64;
            }
            else if (isNaN(chr3))
            {
                enc4 = 64;
            }

            output = output +
                this._keyStr.charAt(enc1) + this._keyStr.charAt(enc2) +
                this._keyStr.charAt(enc3) + this._keyStr.charAt(enc4);
        } // Whend 

        return output;
    } // End Function encode 


    // public method for decoding
    ,decode: function (input)
    {
        var output = "";
        var chr1, chr2, chr3;
        var enc1, enc2, enc3, enc4;
        var i = 0;

        input = input.replace(/[^A-Za-z0-9\+\/\=]/g, "");
        while (i < input.length)
        {
            enc1 = this._keyStr.indexOf(input.charAt(i++));
            enc2 = this._keyStr.indexOf(input.charAt(i++));
            enc3 = this._keyStr.indexOf(input.charAt(i++));
            enc4 = this._keyStr.indexOf(input.charAt(i++));

            chr1 = (enc1 << 2) | (enc2 >> 4);
            chr2 = ((enc2 & 15) << 4) | (enc3 >> 2);
            chr3 = ((enc3 & 3) << 6) | enc4;

            output = output + String.fromCharCode(chr1);

            if (enc3 != 64)
            {
                output = output + String.fromCharCode(chr2);
            }

            if (enc4 != 64)
            {
                output = output + String.fromCharCode(chr3);
            }

        } // Whend 

        output = Base64._utf8_decode(output);

        return output;
    } // End Function decode 


    // private method for UTF-8 encoding
    ,_utf8_encode: function (string)
    {
        var utftext = "";
        string = string.replace(/\r\n/g, "\n");

        for (var n = 0; n < string.length; n++)
        {
            var c = string.charCodeAt(n);

            if (c < 128)
            {
                utftext += String.fromCharCode(c);
            }
            else if ((c > 127) && (c < 2048))
            {
                utftext += String.fromCharCode((c >> 6) | 192);
                utftext += String.fromCharCode((c & 63) | 128);
            }
            else
            {
                utftext += String.fromCharCode((c >> 12) | 224);
                utftext += String.fromCharCode(((c >> 6) & 63) | 128);
                utftext += String.fromCharCode((c & 63) | 128);
            }

        } // Next n 

        return utftext;
    } // End Function _utf8_encode 

    // private method for UTF-8 decoding
    ,_utf8_decode: function (utftext)
    {
        var string = "";
        var i = 0;
        var c, c1, c2, c3;
        c = c1 = c2 = 0;

        while (i < utftext.length)
        {
            c = utftext.charCodeAt(i);

            if (c < 128)
            {
                string += String.fromCharCode(c);
                i++;
            }
            else if ((c > 191) && (c < 224))
            {
                c2 = utftext.charCodeAt(i + 1);
                string += String.fromCharCode(((c & 31) << 6) | (c2 & 63));
                i += 2;
            }
            else
            {
                c2 = utftext.charCodeAt(i + 1);
                c3 = utftext.charCodeAt(i + 2);
                string += String.fromCharCode(((c & 15) << 12) | ((c2 & 63) << 6) | (c3 & 63));
                i += 3;
            }

        } // Whend 

        return string;
    } // End Function _utf8_decode 

}

https://www.fileformat.info/info/unicode/utf8.htm

  • For any character equal to or below 127 (hex 0x7F), the UTF-8 representation is one byte. It is just the lowest 7 bits of the full unicode value. This is also the same as the ASCII value.

  • For characters equal to or below 2047 (hex 0x07FF), the UTF-8 representation is spread across two bytes. The first byte will have the two high bits set and the third bit clear (i.e. 0xC2 to 0xDF). The second byte will have the top bit set and the second bit clear (i.e. 0x80 to 0xBF).

  • For all characters equal to or greater than 2048 but less than 65535 (0xFFFF), the UTF-8 representation is spread across three bytes.

Stefan Steiger
  • 78,642
  • 66
  • 377
  • 442
  • 7
    can you exlpain this a bit more...im totally lost – Muhammad Umer Oct 31 '14 at 21:24
  • I'd just run the code if I were you. `escape` converts string in the one that does only contain url valid characters. That prevents the errors. – Tomáš Zato Nov 24 '14 at 22:37
  • 10
    `escape` and `unescape` were deprecated in JavaScript 1.5 and one should use `encodeURIComponent` or `decodeURIComponent`, respectively, instead. You are using the deprecated and new functions together. Why? See: http://www.w3schools.com/jsref/jsref_escape.asp – Leif Dec 14 '14 at 13:41
  • 3
    @Leif: This only works precisely because escape and unescape are buggy (in the same way) ;) – Stefan Steiger Dec 17 '14 at 19:13
  • @Quandary: Thanks, this is very interesting. But is this bug present in all browsers? – Leif Dec 19 '14 at 14:43
  • @Leif: Seems to be that way. All mayor browsers (Chrome, IE, FF) latest version at least, at present. – Stefan Steiger Apr 01 '15 at 06:38
  • 8
    Anyone else wound up here from using webpack? – Avindra Goolcharan Aug 27 '15 at 19:49
  • I found that in a certain case (handling emojis) the order had to be reversed: `btoa(window.encodeURIComponent(unescape(str)))` worked but `btoa(window.unescape(encodeURIComponent(str)))` did not work – bplittle Dec 15 '16 at 23:25
  • @bplittle I had to reverse the order too: `y = btoa(escape(x))` and `x = unescape(atob(y))`. In fact, I cannot see how it would possibly work in the given order. – Tobia Jun 27 '17 at 14:46
  • @bplittle: Reversing the order does not work for the test-string "✓ à la mode". Also it doesn't work with SVG sources. – Stefan Steiger Jun 28 '17 at 08:06
  • @Tobia: Yes, this is safer than my variant - if you just need to base64-encode some data to send it to the server. However, this unfortunately doesn't work for SVG-sources (data:base64), so I can't use it in my use-case. If you do what you do, you should however use encodeURIComponent and decodeURIComponent, not escape and unescape. Just FYI. However, if you just send an utf8-string to a server, it makes little sense to base64-encode it, because encodeURIComponent would already be enough. I however need the base64 data to display an image. – Stefan Steiger Jun 28 '17 at 08:14
  • The solution from webtoolkit worked flawlessly, thanks – ArthurG Feb 07 '18 at 18:10
  • `unescape` method will be soon deprecated as per MDN https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/unescape – Akansh Oct 30 '18 at 19:06
  • Is the 2017 version still the best solution today for converting arrays to base64 strings? If yes I would move that solution to the top and strikethrough the first stuff you wrote. – David Apr 25 '19 at 23:03
  • @David: Yes, the 2017 version is the one you should use. – Stefan Steiger Apr 26 '19 at 06:24
  • I tested this solution with a 50MB array in IE11 and it seems to be the most RAM performant solution till now (wanted to post a jsfiddle but I'm new and it doesn't appear as public). If you profile this solution with a 50MB array using readAsDataURL() from FileReader you get a RAM peek of 361 and then down to 162 from initials 32MB before adding the file. Instead, with this solution it goes up to only 160 and then down to 82MB, don't know why this difference but I suppose it's good. I will still conduct more tests and try to post my jdfiddle. – David Apr 28 '19 at 20:33
  • Here is my jsfiddle, I hope it works: this one using readAsDataURL from FileReader https://jsfiddle.net/tererecool/ewz8n2xf/ and this one using your 2017 solution: https://jsfiddle.net/tererecool/thw2z7ju/ You need to add a file e.g. of 50MB and run the Memory profiler in IE11 to see how the RAM consumption changes after dragging the file into the page. – David Apr 28 '19 at 20:42
  • @David: With Microsoft Anaheim Edge-Preview out there, I don't think IE11 needs to be supported for much longer. Has only about 10% market share left anyway. – Stefan Steiger Apr 28 '19 at 21:40
  • @StefanSteiger, I know, say that to my PMs :) There are still some companies that rely only on IE11, so for a period it must still be supported, apart from that I find it practical how you can watch de ram consumption live, it should be the same with Chrome & Co., this behavior is not only in IE. – David Apr 28 '19 at 23:09
  • Does the webtoolkit solution only accepts a string as parameter or also a UTF-8 array? :S – David Apr 28 '19 at 23:49
  • I think I was wrong in my assumption before, don't know how I could oversee it, your 2017 solution is only for strings and I was passing an array, thus getting an error, thus the small memory consumption, using now Base64.encode(event.target.result); after using FileReader.readAsText(file) uses also up to 700MB memory with a 50MB file, so if this is correct using FileReader.readAsDataURL instead of your method would be more efficient at the end! – David Apr 28 '19 at 23:56
  • Testing this 2017 solution I found out that the decode function decodes everything, even any nonsense that was not created by the encode function. Could something be added to the code that would recognize such nonsense and only decode valid input otherwise report an error? – TomFT Feb 05 '20 at 20:15
  • Thank you for existing and sharing this knowledge <3 – Raf Aug 10 '21 at 21:39
  • Fot future travelers - to ease usage of this solution I've published it as a package https://www.npmjs.com/package/@frsource/base64 Bon appetit! – FRS Sep 23 '22 at 21:30
45

Use a library instead

We don't have to reinvent the wheel. Just use a library to save the time and headache.

js-base64

https://github.com/dankogai/js-base64 is good and I confirm it supports unicode very well.

Base64.encode('dankogai');  // ZGFua29nYWk=
Base64.encode('小飼弾');    // 5bCP6aO85by+
Base64.encodeURI('小飼弾'); // 5bCP6aO85by-

Base64.decode('ZGFua29nYWk=');  // dankogai
Base64.decode('5bCP6aO85by+');  // 小飼弾
// note .decodeURI() is unnecessary since it accepts both flavors
Base64.decode('5bCP6aO85by-');  // 小飼弾
Tyler Liu
  • 19,552
  • 11
  • 100
  • 84
  • 4
    This is a good solution, although it seems like an oversight for btoa to be limited to ASCII (although atob decoding seems to work fine). This worked for me after several of the other answers would not. Thanks! – For the Name Nov 12 '17 at 23:37
  • 1
    By far this is the easier solution. I needed to be able to encode song lyrics as base64 strings for certain file formats for a program to read. This keeps characters as they are without any extra encoding that might show up later when these files are read by the program. – Chris Barr May 27 '23 at 02:28
32

Using btoa with unescape and encodeURIComponent didn't work for me. Replacing all the special characters with XML/HTML entities and then converting to the base64 representation was the only way to solve this issue for me. Some code:

base64 = btoa(str.replace(/[\u00A0-\u2666]/g, function(c) {
    return '&#' + c.charCodeAt(0) + ';';
}));
xlm
  • 6,854
  • 14
  • 53
  • 55
Italo Borssatto
  • 15,044
  • 7
  • 62
  • 88
  • 1
    Since I posted this question I learned a bit about APIs that are dedicated for what I was doing. If the string you're converting is long, use `Blob` object to handle the conversion. `Blob` can handle any binary data. – Tomáš Zato Oct 15 '15 at 07:51
  • @TomášZato Cool! I never used {Blob} in Javascript. But I see that it isn't compatible with older browsers like IE9, right? – Italo Borssatto Oct 15 '15 at 13:24
  • 1
    Not sure about IE9. But my thought is that if you're doing stuff like base64 conversion client-side you're probably making modern web-app that will, sooner or later, need modern features anyway. Also, there's a blob polyfill. – Tomáš Zato Oct 15 '15 at 13:33
  • @Italo Borssatto: Can you provide me the string that didn't work ? I have never encountered any problem. Works even with Chinese characters. BTW, according to some posts, that regex doesn't work for a whole lot of characters, too. Use https://github.com/beatgammit/base64-js/blob/master/lib/b64.js instead. – Stefan Steiger Jul 21 '16 at 06:25
  • @StefanSteiger Sorry. I tried to find where it happened in the codes I worked in the past year, but I didn't found. Probably it was related to some XLS or image import treatment, I was doing at that time. But I did a lot of tests before getting to this code. – Italo Borssatto Jul 21 '16 at 13:31
  • 2
    @ItaloBorssatto You're a legend! – codeepic Jun 08 '17 at 16:44
  • Thanks @codeepic Was this the only solution that worked for you? Could you share the string you were dealing with? – Italo Borssatto Jun 08 '17 at 16:57
  • 1
    @ItaloBorssatto It was the only solution that worked for me. I needed it in order to grab the d3 svg chart, serialize it using XMLSerializer, pass it into btoa() (this is where I used your solution) to create a base-64 encoded ASCII string, then pass it into image element which is then drawn into canvas and then export it so you can download an image on the front end. Rather convoluted and hacky solution, but one that does not require server-side rendered charts when users want to download some graphics. If you are interested I can send you some code samples. The comment is too short for them – codeepic Jun 08 '17 at 17:22
  • @codeepic Please, could you send just the exact part (string) of the SVG file that was not being converted using just btoa? It's to answer the question from StefanSteiger and to show others that btoa has some bugs. – Italo Borssatto Jun 09 '17 at 07:21
  • 1
    @ItaloBorssatto VogueEspana - Vogue España I cut out irrelevant pieces. The culprit is Vogue España --> ñ prevented an image from loading in the browser. – codeepic Jun 09 '17 at 09:12
  • @StefanSteiger Take a look at codeepic comment. ^^^^ – Italo Borssatto Jun 09 '17 at 09:42
16

I just thought I should share how I actually solved the problem and why I think this is the right solution (provided you don't optimize for old browser).

Converting data to dataURL (data: ...)

var blob = new Blob(
              // I'm using page innerHTML as data
              // note that you can use the array
              // to concatenate many long strings EFFICIENTLY
              [document.body.innerHTML],
              // Mime type is important for data url
              {type : 'text/html'}
); 
// This FileReader works asynchronously, so it doesn't lag
// the web application
var a = new FileReader();
a.onload = function(e) {
     // Capture result here
     console.log(e.target.result);
};
a.readAsDataURL(blob);

Allowing user to save data

Apart from obvious solution - opening new window with your dataURL as URL you can do two other things.

1. Use fileSaver.js

File saver can create actual fileSave dialog with predefined filename. It can also fallback to normal dataURL approach.

2. Use (experimental) URL.createObjectURL

This is great for reusing base64 encoded data. It creates a short URL for your dataURL:

console.log(URL.createObjectURL(blob));
//Prints: blob:http://stackoverflow.com/7c18953f-f5f8-41d2-abf5-e9cbced9bc42

Don't forget to use the URL including the leading blob prefix. I used document.body again:

image description

You can use this short URL as AJAX target, <script> source or <a> href location. You're responsible for destroying the URL though:

URL.revokeObjectURL('blob:http://stackoverflow.com/7c18953f-f5f8-41d2-abf5-e9cbced9bc42')
Tomáš Zato
  • 50,171
  • 52
  • 268
  • 778
  • Thanks mate, you saved my day :) – Sandeep Kumar May 25 '18 at 09:42
  • 1
    All those ideas seem legit, but none of them work in my tries... I always get a blank square at Chrome. Example, with my SO avatar (compacted as those comments are harsh anyway): `window.location = URL.createObjectURL(new Blob([await fetch('https://www.gravatar.com/avatar/acfb059457d47b1086189cddb2f3857c?s=64&d=identicon&r=PG').then(x => x.text())], {type: 'image/jpg'}))` – igorsantos07 Feb 22 '21 at 07:21
10

As an complement to Stefan Steiger answer: (as it doesn't look nice as a comment)

Extending String prototype:

String.prototype.b64encode = function() { 
    return btoa(unescape(encodeURIComponent(this))); 
};
String.prototype.b64decode = function() { 
    return decodeURIComponent(escape(atob(this))); 
};

Usage:

var str = "äöüÄÖÜçéèñ";
var encoded = str.b64encode();
console.log( encoded.b64decode() );

NOTE:

As stated in the comments, using unescape is not recommended as it may be removed in the future:

Warning: Although unescape() is not strictly deprecated (as in "removed from the Web standards"), it is defined in Annex B of the ECMA-262 standard, whose introduction states: … All of the language features and behaviours specified in this annex have one or more undesirable characteristics and in the absence of legacy usage would be removed from this specification.

Note: Do not use unescape to decode URIs, use decodeURI or decodeURIComponent instead.

lepe
  • 24,677
  • 9
  • 99
  • 108
5

btoa() only support characters from String.fromCodePoint(0) up to String.fromCodePoint(255). For Base64 characters with a code point 256 or higher you need to encode/decode these before and after.

And in this point it becomes tricky...

Every possible sign are arranged in a Unicode-Table. The Unicode-Table is divided in different planes (languages, math symbols, and so on...). Every sign in a plane has a unique code point number. Theoretically, the number can become arbitrarily large.

A computer stores the data in bytes (8 bit, hexadecimal 0x00 - 0xff, binary 00000000 - 11111111, decimal 0 - 255). This range normally use to save basic characters (Latin1 range).

For characters with higher codepoint then 255 exist different encodings. JavaScript use 16 bits per sign (UTF-16), the string called DOMString. Unicode can handle code points up to 0x10fffff. That means, that a method must be exist to store several bits over several cells away.

String.fromCodePoint(0x10000).length == 2

UTF-16 use surrogate pairs to store 20bits in two 16bit cells. The first higher surrogate begins with 110110xxxxxxxxxx, the lower second one with 110111xxxxxxxxxx. Unicode reserved own planes for this: https://unicode-table.com/de/#high-surrogates

To store characters in bytes (Latin1 range) standardized procedures use UTF-8.

Sorry to say that, but I think there is no other way to implement this function self.

function stringToUTF8(str)
{
    let bytes = [];

    for(let character of str)
    {
        let code = character.codePointAt(0);

        if(code <= 127)
        {
            let byte1 = code;

            bytes.push(byte1);
        }
        else if(code <= 2047)
        {
            let byte1 = 0xC0 | (code >> 6);
            let byte2 = 0x80 | (code & 0x3F);

            bytes.push(byte1, byte2);
        }
        else if(code <= 65535)
        {
            let byte1 = 0xE0 | (code >> 12);
            let byte2 = 0x80 | ((code >> 6) & 0x3F);
            let byte3 = 0x80 | (code & 0x3F);

            bytes.push(byte1, byte2, byte3);
        }
        else if(code <= 2097151)
        {
            let byte1 = 0xF0 | (code >> 18);
            let byte2 = 0x80 | ((code >> 12) & 0x3F);
            let byte3 = 0x80 | ((code >> 6) & 0x3F);
            let byte4 = 0x80 | (code & 0x3F);

            bytes.push(byte1, byte2, byte3, byte4);
        }
    }

    return bytes;
}

function utf8ToString(bytes, fallback)
{
    let valid = undefined;
    let codePoint = undefined;
    let codeBlocks = [0, 0, 0, 0];

    let result = "";

    for(let offset = 0; offset < bytes.length; offset++)
    {
        let byte = bytes[offset];

        if((byte & 0x80) == 0x00)
        {
            codeBlocks[0] = byte & 0x7F;

            codePoint = codeBlocks[0];
        }
        else if((byte & 0xE0) == 0xC0)
        {
            codeBlocks[0] = byte & 0x1F;

            byte = bytes[++offset];
            if(offset >= bytes.length || (byte & 0xC0) != 0x80) { valid = false; break; }

            codeBlocks[1] = byte & 0x3F;

            codePoint = (codeBlocks[0] << 6) + codeBlocks[1];
        }
        else if((byte & 0xF0) == 0xE0)
        {
            codeBlocks[0] = byte & 0xF;

            for(let blockIndex = 1; blockIndex <= 2; blockIndex++)
            {
                byte = bytes[++offset];
                if(offset >= bytes.length || (byte & 0xC0) != 0x80) { valid = false; break; }

                codeBlocks[blockIndex] = byte & 0x3F;
            }
            if(valid === false) { break; }

            codePoint = (codeBlocks[0] << 12) + (codeBlocks[1] << 6) + codeBlocks[2];
        }
        else if((byte & 0xF8) == 0xF0)
        {
            codeBlocks[0] = byte & 0x7;

            for(let blockIndex = 1; blockIndex <= 3; blockIndex++)
            {
                byte = bytes[++offset];
                if(offset >= bytes.length || (byte & 0xC0) != 0x80) { valid = false; break; }

                codeBlocks[blockIndex] = byte & 0x3F;
            }
            if(valid === false) { break; }

            codePoint = (codeBlocks[0] << 18) + (codeBlocks[1] << 12) + (codeBlocks[2] << 6) + (codeBlocks[3]);
        }
        else
        {
            valid = false; break;
        }

        result += String.fromCodePoint(codePoint);
    }

    if(valid === false)
    {
        if(!fallback)
        {
            throw new TypeError("Malformed utf-8 encoding.");
        }

        result = "";

        for(let offset = 0; offset != bytes.length; offset++)
        {
            result += String.fromCharCode(bytes[offset] & 0xFF);
        }
    }

    return result;
}

function decodeBase64(text, binary)
{
    if(/[^0-9a-zA-Z\+\/\=]/.test(text)) { throw new TypeError("The string to be decoded contains characters outside of the valid base64 range."); }

    let codePointA = 'A'.codePointAt(0);
    let codePointZ = 'Z'.codePointAt(0);
    let codePointa = 'a'.codePointAt(0);
    let codePointz = 'z'.codePointAt(0);
    let codePointZero = '0'.codePointAt(0);
    let codePointNine = '9'.codePointAt(0);
    let codePointPlus = '+'.codePointAt(0);
    let codePointSlash = '/'.codePointAt(0);

    function getCodeFromKey(key)
    {
        let keyCode = key.codePointAt(0);

        if(keyCode >= codePointA && keyCode <= codePointZ)
        {
            return keyCode - codePointA;
        }
        else if(keyCode >= codePointa && keyCode <= codePointz)
        {
            return keyCode + 26 - codePointa;
        }
        else if(keyCode >= codePointZero && keyCode <= codePointNine)
        {
            return keyCode + 52 - codePointZero;
        }
        else if(keyCode == codePointPlus)
        {
            return 62;
        }
        else if(keyCode == codePointSlash)
        {
            return 63;
        }

        return undefined;
    }

    let codes = Array.from(text).map(character => getCodeFromKey(character));

    let bytesLength = Math.ceil(codes.length / 4) * 3;

    if(codes[codes.length - 2] == undefined) { bytesLength = bytesLength - 2; } else if(codes[codes.length - 1] == undefined) { bytesLength--; }

    let bytes = new Uint8Array(bytesLength);

    for(let offset = 0, index = 0; offset < bytes.length;)
    {
        let code1 = codes[index++];
        let code2 = codes[index++];
        let code3 = codes[index++];
        let code4 = codes[index++];

        let byte1 = (code1 << 2) | (code2 >> 4);
        let byte2 = ((code2 & 0xf) << 4) | (code3 >> 2);
        let byte3 = ((code3 & 0x3) << 6) | code4;

        bytes[offset++] = byte1;
        bytes[offset++] = byte2;
        bytes[offset++] = byte3;
    }

    if(binary) { return bytes; }

    return utf8ToString(bytes, true);
}

function encodeBase64(bytes) {
    if (bytes === undefined || bytes === null) {
        return '';
    }
    if (bytes instanceof Array) {
        bytes = bytes.filter(item => {
            return Number.isFinite(item) && item >= 0 && item <= 255;
        });
    }

    if (
        !(
            bytes instanceof Uint8Array ||
            bytes instanceof Uint8ClampedArray ||
            bytes instanceof Array
        )
    ) {
        if (typeof bytes === 'string') {
            const str = bytes;
            bytes = Array.from(unescape(encodeURIComponent(str))).map(ch =>
                ch.codePointAt(0)
            );
        } else {
            throw new TypeError('bytes must be of type Uint8Array or String.');
        }
    }

    const keys = [
        'A',
        'B',
        'C',
        'D',
        'E',
        'F',
        'G',
        'H',
        'I',
        'J',
        'K',
        'L',
        'M',
        'N',
        'O',
        'P',
        'Q',
        'R',
        'S',
        'T',
        'U',
        'V',
        'W',
        'X',
        'Y',
        'Z',
        'a',
        'b',
        'c',
        'd',
        'e',
        'f',
        'g',
        'h',
        'i',
        'j',
        'k',
        'l',
        'm',
        'n',
        'o',
        'p',
        'q',
        'r',
        's',
        't',
        'u',
        'v',
        'w',
        'x',
        'y',
        'z',
        '0',
        '1',
        '2',
        '3',
        '4',
        '5',
        '6',
        '7',
        '8',
        '9',
        '+',
        '/'
    ];
    const fillKey = '=';

    let byte1;
    let byte2;
    let byte3;
    let sign1 = ' ';
    let sign2 = ' ';
    let sign3 = ' ';
    let sign4 = ' ';

    let result = '';

    for (let index = 0; index < bytes.length; ) {
        let fillUpAt = 0;

        // tslint:disable:no-increment-decrement
        byte1 = bytes[index++];
        byte2 = bytes[index++];
        byte3 = bytes[index++];

        if (byte2 === undefined) {
            byte2 = 0;
            fillUpAt = 2;
        }

        if (byte3 === undefined) {
            byte3 = 0;
            if (!fillUpAt) {
                fillUpAt = 3;
            }
        }

        // tslint:disable:no-bitwise
        sign1 = keys[byte1 >> 2];
        sign2 = keys[((byte1 & 0x3) << 4) + (byte2 >> 4)];
        sign3 = keys[((byte2 & 0xf) << 2) + (byte3 >> 6)];
        sign4 = keys[byte3 & 0x3f];

        if (fillUpAt > 0) {
            if (fillUpAt <= 2) {
                sign3 = fillKey;
            }
            if (fillUpAt <= 3) {
                sign4 = fillKey;
            }
        }

        result += sign1 + sign2 + sign3 + sign4;

        if (fillUpAt) {
            break;
        }
    }

    return result;
}

let base64 = encodeBase64("\u{1F604}"); // unicode code point escapes for smiley
let str = decodeBase64(base64);

console.log("base64", base64);
console.log("str", str);

document.body.innerText = str;

how to use it: decodeBase64(encodeBase64("\u{1F604}"))

demo: https://jsfiddle.net/qrLadeb8/

Benjamin Toueg
  • 10,511
  • 7
  • 48
  • 79
Martin Wantke
  • 4,287
  • 33
  • 21
4

Another solution for browser without using unescape:

function ToBinary(str)
{
    let result="";

    str=encodeURIComponent(str);

    for(let i=0;i<str.length;i++)
        if(str[i]=="%")
        {
            result+=String.fromCharCode(parseInt(str.substring(i+1,i+3),16));
            i+=2;
        }
        else
            result+=str[i];

    return result;
}

btoa(ToBinary("тест"));//0YLQtdGB0YI=
Alexander
  • 41
  • 3
3

A solution that converts the string to utf-8, which is slightly shorter than the utf-16 or URLEncoded versions many of the other answers suggest. It's also more compatible with how other languages like python and PHP would decode the strings:

Encode

function btoa_utf8(value) {
    return btoa(
        String.fromCharCode(
            ...new TextEncoder('utf-8')
                   .encode(value)
        )
    );
}

Decode

function atob_utf8(value) {
    const value_latin1 = atob(value);
    return new TextDecoder('utf-8').decode(
        Uint8Array.from(
            { length: value_latin1.length },
            (element, index) => value_latin1.charCodeAt(index)
        )
    )
}

You can replace the 'utf-8' string in either of these with a different character encoding if you prefer.

Note This depends on the TextEncoder class. This is supported in most browsers nowadays but if you need to target older browsers check if it's available.

Professor Abronsius
  • 33,063
  • 5
  • 32
  • 46
mousetail
  • 7,009
  • 4
  • 25
  • 45
1

I just ran into this problem myself.

First, modify your code slightly:

var download = "<?xml version=\"1.0\" encoding=\"utf-8\"?>"
                  +"<"+this.gamesave.tagName+">"
                  +this.xml.firstChild.innerHTML
                  +"</"+this.gamesave.tagName+">";

this.loader.src = "data:application/x-forcedownload;base64,"+
                  btoa(download);

Then use your favorite web inspector, put a breakpoint on the line of code that assigns this.loader.src, then execute this code:

for (var i = 0; i < download.length; i++) {
  if (download[i].charCodeAt(0) > 255) {
    console.warn('found character ' + download[i].charCodeAt(0) + ' "' + download[i] + '" at position ' + i);
  }
}

Depending on your application, replacing the characters that are out of range may or may not work, since you'll be modifying the data. See the note on MDN about unicode characters with the btoa method:

https://developer.mozilla.org/en-US/docs/Web/API/window.btoa

Mark Salisbury
  • 111
  • 1
  • 3