FileReader - which encodings are supported?

Question

Want to simlpy read user-input files as text.

Can rely on modern browser usage, so I use FileReader for that (which works like a charm).

reader.readAsText(myfile, encoding);

I know that encoding defaults to UTF-8.

But as my users will upload files from various sources (Windows, Mac, Linux) and various browsers I ask the user to provide the encoding via a select box.

So e.g. for a western european windows text file I expect the user to choose e.g. windows-1252.

I was not able to find a list of supported encodings for FileReader (assuming this is at least depending on the browser).

I am not asking to auto-determine the encoding, I just want to fill my select box in a way like:

<select id="encoding">
   <option value="windows-1252">Windows (Western Latin)</option>
   <option value="utf-8">UTF-8</option>
   <option value="...">...</option>
</select>

So my questions are:

Where do I get a list of supported encodings to fill the option values?
How to determine the exact writing of those values (is it 'utf8' or 'UTF-8' or...) and are those depending on the OS / browser?
Does readAsText(myfile, unsupportedEncoding) throw any error which I can catch if encoding is not supported?

I'd prefer not to use any major 3rd party libraries for that.

Bonus Question:

Is there a simple way to get meaningful translations of the values, e.g. cp10029 means Mac (Central European)?

A cursory search of the googles didn't reveal much. Maybe this will help? http://stackoverflow.com/questions/37884928/cant-fit-file-encoding-when-working-with-chrome-file-system-api/37885580 — Dan Wilson, Nov 24 '16 at 15:59
thanks, I googled a lot, that's why I am asking here :-( I checked your recommendation but this refers to a no-real-text-input IMHO but in my case all files are "real text" input only in different encodings. — LBA, Nov 24 '16 at 16:16
The supported code-pages can be found [here](https://developer.mozilla.org/en-US/docs/Web/API/TextDecoder/TextDecoder#Parameters). I would recommend taking a second look at the link provided by Dan as this is a good way to go about it. This approach also let you detect BOM and features to allow guessing the encoding in advance. — , Nov 26 '16 at 05:43

score 10 · Accepted Answer · edited Feb 05 '20 at 13:17

Encoding standarts - https://github.com/whatwg/encoding/ (in JSON format - https://github.com/whatwg/encoding/blob/master/encodings.json. Use values from fields "labels")

Encoding parameter is not case sensitive.

NO, readAsText(myfile, unsupportedEncoding) not throw any error. The function simply uses the default encoding("utf8").

window.onload = function() {

    //Check File API support
    if (window.File && window.FileList && window.FileReader) {
        var filesInput = document.getElementById("files");

        filesInput.addEventListener("change", function(event) {

            var files = event.target.files; //FileList object
            var output = document.getElementById("result");

            for (var i = 0; i < files.length; i++) {
                var file = files[i];

                //Only plain text
                if (!file.type.match('plain')) continue;

                var picReader = new FileReader();

                picReader.addEventListener("load", function(event) {

                    var textFile = event.target;

                    var div = document.createElement("div");

                    div.innerText = textFile.result;

                    output.insertBefore(div, null);

                });
                //Read the text file
                picReader.readAsText(file, "cP1251");
            }

        });
    }
    else {
        console.log("Your browser does not support File API");
    }
}

Demo

To get translations of the values you can use JSON file (https://github.com/whatwg/encoding/blob/master/encodings.json), parameter "heading" and "name".

I am a bit concerned that WHATWG seems to be the only group trying to keep track of the obviously only living standard but your answer responds correctly to all of my questions so I'll accept it. As soon as there might be a better/"official" response I might change that, hope that sounds reasonable. — LBA, Jan 23 '19 at 10:38

score 0 · Answer 2 · answered Aug 26 '23 at 06:30

Names and labels
The table below lists all encodings and their labels user agents must support. User agents must not support any other encodings or labels.

# UTF-8
"unicode-1-1-utf-8"
"unicode11utf8"
"unicode20utf8"
"utf-8"
"utf8"
"x-unicode20utf8"

# IBM866
"866"
"cp866"
"csibm866"
"ibm866"

# ISO-8859-2
"csisolatin2"
"iso-8859-2"
"iso-ir-101"
"iso8859-2"
"iso88592"
"iso_8859-2"
"iso_8859-2:1987"
"l2"
"latin2"

# ISO-8859-3
"csisolatin3"
"iso-8859-3"
"iso-ir-109"
"iso8859-3"
"iso88593"
"iso_8859-3"
"iso_8859-3:1988"
"l3"
"latin3"

# ISO-8859-4
"csisolatin4"
"iso-8859-4"
"iso-ir-110"
"iso8859-4"
"iso88594"
"iso_8859-4"
"iso_8859-4:1988"
"l4"
"latin4"

# ISO-8859-5
"csisolatincyrillic"
"cyrillic"
"iso-8859-5"
"iso-ir-144"
"iso8859-5"
"iso88595"
"iso_8859-5"
"iso_8859-5:1988"

# ISO-8859-6
"arabic"
"asmo-708"
"csiso88596e"
"csiso88596i"
"csisolatinarabic"
"ecma-114"
"iso-8859-6"
"iso-8859-6-e"
"iso-8859-6-i"
"iso-ir-127"
"iso8859-6"
"iso88596"
"iso_8859-6"
"iso_8859-6:1987"

# ISO-8859-7
"csisolatingreek"
"ecma-118"
"elot_928"
"greek"
"greek8"
"iso-8859-7"
"iso-ir-126"
"iso8859-7"
"iso88597"
"iso_8859-7"
"iso_8859-7:1987"
"sun_eu_greek"

# ISO-8859-8
"csiso88598e"
"csisolatinhebrew"
"hebrew"
"iso-8859-8"
"iso-8859-8-e"
"iso-ir-138"
"iso8859-8"
"iso88598"
"iso_8859-8"
"iso_8859-8:1988"
"visual"

# ISO-8859-8-I
"csiso88598i"
"iso-8859-8-i"
"logical"

# ISO-8859-10
"csisolatin6"
"iso-8859-10"
"iso-ir-157"
"iso8859-10"
"iso885910"
"l6"
"latin6"

# ISO-8859-13
"iso-8859-13"
"iso8859-13"
"iso885913"

# ISO-8859-14
"iso-8859-14"
"iso8859-14"
"iso885914"

# ISO-8859-15
"csisolatin9"
"iso-8859-15"
"iso8859-15"
"iso885915"
"iso_8859-15"
"l9"

# ISO-8859-16
"iso-8859-16"

# KOI8-R
"cskoi8r"
"koi"
"koi8"
"koi8-r"
"koi8_r"

# KOI8-U
"koi8-ru"
"koi8-u"

# macintosh
"csmacintosh"
"mac"
"macintosh"
"x-mac-roman"

# windows-874
"dos-874"
"iso-8859-11"
"iso8859-11"
"iso885911"
"tis-620"
"windows-874"

# windows-1250
"cp1250"
"windows-1250"
"x-cp1250"

# windows-1251
"cp1251"
"windows-1251"
"x-cp1251"

# windows-1252
"ansi_x3.4-1968"
"ascii"
"cp1252"
"cp819"
"csisolatin1"
"ibm819"
"iso-8859-1"
"iso-ir-100"
"iso8859-1"
"iso88591"
"iso_8859-1"
"iso_8859-1:1987"
"l1"
"latin1"
"us-ascii"
"windows-1252"
"x-cp1252"

# windows-1253
"cp1253"
"windows-1253"
"x-cp1253"

# windows-1254
"cp1254"
"csisolatin5"
"iso-8859-9"
"iso-ir-148"
"iso8859-9"
"iso88599"
"iso_8859-9"
"iso_8859-9:1989"
"l5"
"latin5"
"windows-1254"
"x-cp1254"

# windows-1255
"cp1255"
"windows-1255"
"x-cp1255"

# windows-1256
"cp1256"
"windows-1256"
"x-cp1256"

# windows-1257
"cp1257"
"windows-1257"
"x-cp1257"

# windows-1258
"cp1258"
"windows-1258"
"x-cp1258"

# x-mac-cyrillic
"x-mac-cyrillic"
"x-mac-ukrainian"

More encoding see here: https://encoding.spec.whatwg.org/#names-and-labels

FileReader - which encodings are supported?

2 Answers2