Encoding CharsetNames for Charset.forName(String)

Question

I have a question about Charset.forName(String charsetName). Is there a list of charsetNames I can refer to? For example, for UTF-8, we use "utf8" for the charsetName. What about WINDOWS-1252, GB18030, etc.?

http://docs.oracle.com/javase/6/docs/technotes/guides/intl/encoding.doc.html and the latest http://download.java.net/jdk8/docs/technotes/guides/intl/encoding.doc.html — nullpotent, Sep 23 '12 at 23:39
Also there is a good discussion at http://stackoverflow.com/questions/1684040/java-why-charset-names-are-not-constants — Guido Simone, Sep 23 '12 at 23:44

score 7 · Answer 1 · answered Sep 12 '13 at 11:09

Charset         Description

US-ASCII        Seven-bit ASCII, a.k.a. ISO646-US, a.k.a. the Basic Latin block of the Unicode character set
ISO-8859-1      ISO Latin Alphabet No. 1, a.k.a. ISO-LATIN-1
UTF-8           Eight-bit UCS Transformation Format
UTF-16BE        Sixteen-bit UCS Transformation Format, big-endian byte order
UTF-16LE        Sixteen-bit UCS Transformation Format, little-endian byte order
UTF-16          Sixteen-bit UCS Transformation Format, byte order identified by an optional byte-order mark

Reference: http://docs.oracle.com/javase/7/docs/api/java/nio/charset/Charset.html

score 3 · Answer 2 · answered Jan 30 '15 at 16:16

3

The charset names in Java are platform dependent, there are only 6 constants in the StandardCharsets class.

To view the all charsets you should look at IANA. Check Preferred MIME Name and aliases columns.

answered Jan 30 '15 at 16:16

telebog

1,706
5
25
34

score 2 · Answer 3 · answered Feb 11 '20 at 17:10

To list all character set installed in your JVM, you might use the following code snippet (Java 8 SE or higher):

SortedMap<String, Charset> map = Charset.availableCharsets();
map.keySet().stream().forEach(System.out::println);

On my system, this lists around 170 character sets.

score 0 · Answer 4 · edited Jun 20 '20 at 09:12

The java Charset library is required to accept just a few basic encodings: ASCII, Latin-1 (ISO-8859-1), and a handful of UTF variants that you can see listed in this answer. That's a pretty useless list for any practical purposes, unless your scope is limited to Latin-1. In reality, Java classes can handle a large number of encodings that you can read about in the Supported Encodings page. Quoting from it:

The java.io.InputStreamReader, java.io.OutputStreamWriter, java.lang.String classes, and classes in the java.nio.charset package can convert between Unicode and a number of other character encodings. The supported encodings vary between different implementations of Java SE 8. The class description for java.nio.charset.Charset lists the encodings that any implementation of Java SE 8 is required to support.

JDK 8 for all platforms (Solaris, Linux, and Microsoft Windows) and JRE 8 for Solaris and Linux support all encodings shown on this page. JRE 8 for Microsoft Windows may be installed as a complete international version or as a European languages version. [...]

The rest of the page consists of an extensive table of encoding names and synonyms, which is what the OP was after all those years ago...

Encoding CharsetNames for Charset.forName(String)

4 Answers4