10

I have a question about Charset.forName(String charsetName). Is there a list of charsetNames I can refer to? For example, for UTF-8, we use "utf8" for the charsetName. What about WINDOWS-1252, GB18030, etc.?

Jason Ching
  • 1,037
  • 4
  • 19
  • 27
  • 2
    http://docs.oracle.com/javase/6/docs/technotes/guides/intl/encoding.doc.html and the latest http://download.java.net/jdk8/docs/technotes/guides/intl/encoding.doc.html – nullpotent Sep 23 '12 at 23:39
  • Also there is a good discussion at http://stackoverflow.com/questions/1684040/java-why-charset-names-are-not-constants – Guido Simone Sep 23 '12 at 23:44

4 Answers4

7
Charset         Description

US-ASCII        Seven-bit ASCII, a.k.a. ISO646-US, a.k.a. the Basic Latin block of the Unicode character set
ISO-8859-1      ISO Latin Alphabet No. 1, a.k.a. ISO-LATIN-1
UTF-8           Eight-bit UCS Transformation Format
UTF-16BE        Sixteen-bit UCS Transformation Format, big-endian byte order
UTF-16LE        Sixteen-bit UCS Transformation Format, little-endian byte order
UTF-16          Sixteen-bit UCS Transformation Format, byte order identified by an optional byte-order mark

Reference: http://docs.oracle.com/javase/7/docs/api/java/nio/charset/Charset.html

iBabur
  • 974
  • 12
  • 13
3

The charset names in Java are platform dependent, there are only 6 constants in the StandardCharsets class.

To view the all charsets you should look at IANA. Check Preferred MIME Name and aliases columns.

telebog
  • 1,706
  • 5
  • 25
  • 34
2

To list all character set installed in your JVM, you might use the following code snippet (Java 8 SE or higher):

SortedMap<String, Charset> map = Charset.availableCharsets();
map.keySet().stream().forEach(System.out::println);

On my system, this lists around 170 character sets.

GSFK
  • 21
  • 1
0

The java Charset library is required to accept just a few basic encodings: ASCII, Latin-1 (ISO-8859-1), and a handful of UTF variants that you can see listed in this answer. That's a pretty useless list for any practical purposes, unless your scope is limited to Latin-1. In reality, Java classes can handle a large number of encodings that you can read about in the Supported Encodings page. Quoting from it:

The java.io.InputStreamReader, java.io.OutputStreamWriter, java.lang.String classes, and classes in the java.nio.charset package can convert between Unicode and a number of other character encodings. The supported encodings vary between different implementations of Java SE 8. The class description for java.nio.charset.Charset lists the encodings that any implementation of Java SE 8 is required to support.

JDK 8 for all platforms (Solaris, Linux, and Microsoft Windows) and JRE 8 for Solaris and Linux support all encodings shown on this page. JRE 8 for Microsoft Windows may be installed as a complete international version or as a European languages version. [...]

The rest of the page consists of an extensive table of encoding names and synonyms, which is what the OP was after all those years ago...

Community
  • 1
  • 1
alexis
  • 48,685
  • 16
  • 101
  • 161