23

At the moment I have a large JavaScript string I'm attempting to write to a file, but in a different encoding (ISO-8859-1). I was hoping to use something like downloadify. Downloadify only accepts normal JavaScript strings or base64 encoded strings.

Because of this, I've decided to compress my string using JSZip which generates a nicely base64 encoded string that can be passed to downloadify, and downloaded to my desktop. Huzzah! The issue is that the string I compressed, of course, is still the wrong encoding.

Luckily JSZip can take a Uint8Array as data, instead of a string. So is there any way to convert a JavaScript string into a ISO-8859-1 encoded string and store it in a Uint8Array?

Alternatively, if I'm approaching this all wrong, is there a better solution all together? Is there a fancy JavaScript string class that can use different internal encodings?

Edit: To clarify, I'm not pushing this string to a webpage so it won't automatically convert it for me. I'm doing something like this:

var zip = new JSZip();
zip.file("genSave.txt", result);

return zip.generate({compression:"DEFLATE"});

And for this to make sense, I would need result to be in the proper encoding (and JSZip only takes strings, arraybuffers, or uint8arrays).

Final Edit (This was -not- a duplicate question because the result wasn't being displayed in the browser or transmitted to a server where the encoding could be changed):

This turned out to be a little more obscure than I had thought, so I ended up rolling my own solution. It's not nearly as robust as a proper solution would be, but it'll convert a JavaScript string into windows-1252 encoding, and stick it in a Uint8Array:

var enc = new string_transcoder("windows-1252");
var tenc = enc.transcode(result); //This is now a Uint8Array

You can then either use it in the array like I did:

//Make this into a zip
var zip = new JSZip();   
zip.file("genSave.txt", tenc);   
return zip.generate({compression:"DEFLATE"});

Or convert it into a windows-1252 encoded string using this string encoding library:

var string = TextDecoder("windows-1252").decode(tenc);

To use this function, either use:

<script src="//www.eu4editor.com/string_transcoder.js"></script>

Or include this:

function string_transcoder (target) {

    this.encodeList = encodings[target];
    if (this.encodeList === undefined) {
        return undefined;
    }

    //Initialize the easy encodings
    if (target === "windows-1252") {
        var i;
        for (i = 0x0; i <= 0x7F; i++) {
            this.encodeList[i] = i;          
        }
        for (i = 0xA0; i <= 0xFF; i++) {
            this.encodeList[i] = i;
        }
    }

}

string_transcoder.prototype.transcode = function (inString) {


    var res = new Uint8Array(inString.length), i;


    for (i = 0; i < inString.length; i++) {
        var temp = inString.charCodeAt(i);
        var tempEncode = (this.encodeList)[temp];
        if (tempEncode === undefined) {
            return undefined; //This encoding is messed up
        } else {
            res[i] = tempEncode;
        }
    }

    return res;
};

encodings = {

    "windows-1252": {0x20AC:0x80, 0x201A:0x82, 0x0192:0x83, 0x201E:0x84, 0x2026:0x85, 0x2020:0x86, 0x2021:0x87, 0x02C6:0x88, 0x2030:0x89, 0x0160:0x8A, 0x2039:0x8B, 0x0152:0x8C, 0x017D:0x8E, 0x2018:0x91, 0x2019:0x92, 0x201C:0x93, 0x201D:0x94, 0x2022:0x95, 0x2013:0x96, 0x2014:0x97, 0x02DC:0x98, 0x2122:0x99, 0x0161:0x9A, 0x203A:0x9B, 0x0153:0x9C, 0x017E:0x9E, 0x0178:0x9F}     

};
David
  • 553
  • 1
  • 5
  • 12
  • Wouldn't something like `utfstring = unescape(encodeURIComponent(originalstring));` work? – Joren Sep 18 '13 at 18:44
  • 1
    Unfortunately not. My goal is to see 'Île' when viewing the final file as ISO-8859-1. When writing the file normally it writes as UCS-2 which results in 'ÃŽle' when viewed as ISO-8859-1. When using your method, it results in 'Île'. This is not the same issue as the proposed duplicate as I'm not asking the browser to display this, and thus changing the HTML5 meta tag won't solve the issue. – David Sep 18 '13 at 19:05
  • Did you answer your own question? or am I missing something? – Enigmadan Sep 18 '13 at 22:31
  • Yeah, I did. It was incorrectly closed as a duplicate and I didn't want to leave it hanging there unanswered. – David Sep 18 '13 at 23:08
  • 6
    @David: If that edit was answer, please [rollback it](http://stackoverflow.com/posts/18879860/revisions) and it post it as a [self-answer](http://stackoverflow.com/help/self-answer) (which you can accept then) – Bergi Sep 18 '13 at 23:55
  • Did you try with the `charset` attribute of the `script` element? http://www.w3.org/TR/html401/interact/scripts.html#h-18.2.1 – Martín Schonaker Sep 27 '13 at 02:42

3 Answers3

6

This turned out to be a little more obscure than [the author] had thought, so [the author] ended up rolling [his] own solution. It's not nearly as robust as a proper solution would be, but it'll convert a JavaScript string into windows-1252 encoding, and stick it in a Uint8Array:

var enc = new string_transcoder("windows-1252");
var tenc = enc.transcode(result); //This is now a Uint8Array

You can then either use it in the array like [the author] did:

//Make this into a zip
var zip = new JSZip();   
zip.file("genSave.txt", tenc);   
return zip.generate({compression:"DEFLATE"});

Or convert it into a windows-1252 encoded string using this string encoding library:

var string = TextDecoder("windows-1252").decode(tenc);

To use this function, either use:

<script src="//www.eu4editor.com/string_transcoder.js"></script>

Or include this:

function string_transcoder (target) {

    this.encodeList = encodings[target];
    if (this.encodeList === undefined) {
        return undefined;
    }

    //Initialize the easy encodings
    if (target === "windows-1252") {
        var i;
        for (i = 0x0; i <= 0x7F; i++) {
            this.encodeList[i] = i;          
        }
        for (i = 0xA0; i <= 0xFF; i++) {
            this.encodeList[i] = i;
        }
    }

}

string_transcoder.prototype.transcode = function (inString) {


    var res = new Uint8Array(inString.length), i;


    for (i = 0; i < inString.length; i++) {
        var temp = inString.charCodeAt(i);
        var tempEncode = (this.encodeList)[temp];
        if (tempEncode === undefined) {
            return undefined; //This encoding is messed up
        } else {
            res[i] = tempEncode;
        }
    }

    return res;
};

encodings = {

    "windows-1252": {0x20AC:0x80, 0x201A:0x82, 0x0192:0x83, 0x201E:0x84, 0x2026:0x85, 0x2020:0x86, 0x2021:0x87, 0x02C6:0x88, 0x2030:0x89, 0x0160:0x8A, 0x2039:0x8B, 0x0152:0x8C, 0x017D:0x8E, 0x2018:0x91, 0x2019:0x92, 0x201C:0x93, 0x201D:0x94, 0x2022:0x95, 0x2013:0x96, 0x2014:0x97, 0x02DC:0x98, 0x2122:0x99, 0x0161:0x9A, 0x203A:0x9B, 0x0153:0x9C, 0x017E:0x9E, 0x0178:0x9F}     

};
Nate
  • 12,963
  • 4
  • 59
  • 80
  • 1
    Thank you! :) As it happens, this is first solution that I could found for the problem of encoding HTTP Status Description in NetFramework: Response.StatusDescription encoded in CP1252 and my browser always try to decode it as CP1251. – ornic Jan 31 '22 at 14:28
  • 1
    The linked https://code.google.com/p/stringencoding/ library above does not exist anymore. But this answer of another post https://stackoverflow.com/a/54883467/1915920 helps and may relates to it's successor: https://github.com/inexorabletash/text-encoding – Andreas Covidiot Sep 28 '22 at 07:13
1

Test the following script:

<script type="text/javascript" charset="utf-8">
Irfan TahirKheli
  • 3,652
  • 1
  • 22
  • 36
user2511140
  • 1,658
  • 3
  • 26
  • 32
  • No, this doesn't apply. This was all internal javascript string encoding (not literals and not formatted by the browser). – David Oct 26 '13 at 20:44
0

The best solution for me was posted here and this is my one-liner:

<!-- Required for non-UTF encodings (quite big) -->
<script src="encoding-indexes.js"></script>

<script src="encoding.js"></script>
...
// windows-1252 is just one typical example encoding/transcoding
let transcodedString = new TextDecoder( 'windows-1252' ).decode( 
                         new TextEncoder().encode( someUtf8String ))

or this if the transcoding has to be applied on multiple inputs reusing the encoder and decoder:

let srcArr = [ ... ]  // some UTF-8 string array
let encoder = new TextEncoder()
let decoder = new TextDecoder( 'windows-1252' )
let transcodedArr = srcArr.forEach( (s,i) => { 
                      srcArr[i] = decoder.decode( encoder.encode( s )) })

(The slightly modified other answer from related question:)

This is what I found after a more specific Google search than just UTF-8 encode/decode. so for those who are looking for a converting library to convert between encodings, here you go.

github.com/inexorabletash/text-encoding

var uint8array = new TextEncoder().encode(str);
var str = new TextDecoder(encoding).decode(uint8array);

Paste from repo readme

All encodings from the Encoding specification are supported:

utf-8 ibm866 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 
iso-8859-7 iso-8859-8 iso-8859-8-i iso-8859-10 iso-8859-13 iso-8859-14 
iso-8859-15 iso-8859-16 koi8-r koi8-u macintosh windows-874 windows-1250 
windows-1251 windows-1252 windows-1253 windows-1254 windows-1255 
windows-1256 windows-1257 windows-1258 x-mac-cyrillic gb18030 hz-gb-2312 
big5 euc-jp iso-2022-jp shift_jis euc-kr replacement utf-16be utf-16le 
x-user-defined

(Some encodings may be supported under other names, e.g. ascii, iso-8859-1, etc. See Encoding for additional labels for each encoding.)

Andreas Covidiot
  • 4,286
  • 5
  • 51
  • 96