573

I'm trying to use a constant instead of a string literal in this piece of code:

new InputStreamReader(new FileInputStream(file), "UTF-8")

"UTF-8" appears in the code rather often, and would be much better to refer to some static final variable instead. Do you know where I can find such a variable in JDK?

BTW, on a second thought, such constants are bad design: Public Static Literals ... Are Not a Solution for Data Duplication

yegor256
  • 102,010
  • 123
  • 446
  • 597
  • 11
    See [this question](http://stackoverflow.com/q/1684040/3009). – highlycaffeinated Jul 14 '11 at 18:51
  • 1
    Note: if you are already on Java 7, use [`Files.newBufferedWriter(Path path, Charset cs)`](https://docs.oracle.com/javase/7/docs/api/java/nio/file/Files.html#newBufferedReader(java.nio.file.Path,%20java.nio.charset.Charset)) from NIO. – Franklin Yu Aug 08 '18 at 14:55
  • 3
    That's some really bad advice from your link. He wants you to make a wrapper class for every possible string constant you might use? – Ariel Jun 25 '20 at 20:15

11 Answers11

979

In Java 1.7+, java.nio.charset.StandardCharsets defines constants for Charset including UTF_8.

import java.nio.charset.StandardCharsets;

...

StandardCharsets.UTF_8.name();

For Android: minSdk 19

Jameson
  • 6,400
  • 6
  • 32
  • 53
Roger
  • 10,851
  • 3
  • 26
  • 39
  • 3
    do you use .toString() on that? – Matt Broekhuis Oct 22 '13 at 17:30
  • 57
    `.toString()` will work but the proper function is `.name()`. 99.9% toString is not the answer. – Roger Feb 19 '14 at 16:25
  • 1
    btw `.displayName()` will also work unless it is overridden for localization as intended. – Roger Feb 19 '14 at 16:35
  • 39
    You don't really need to call `name()` at all. You can directly pass the `Charset` object into the `InputStreamReader` constructor. – Natix Nov 19 '14 at 10:33
  • 1
    Note that in Andorid, this require API level 19+. – Hai Zhang Feb 06 '15 at 06:09
  • 7
    And there are other libs out there which do require a `String`, perhaps because of legacy reasons. In such cases, I keep a `Charset` object around, typically derived from `StandardCharsets`, and use `name()` if needed. – Magnilex Mar 02 '15 at 14:32
  • 1
    The result for `name()`, `toString()` and just putting `StandardCharsets.UTF-8` directly is all the same because `Charset.toString()` just calls `Charset.name()` and if you use `StandardCharsets.UTF-8` in a place where a String is expected `Charset.toString()` will be called automatically. – anothernode Jul 13 '17 at 14:40
  • In response to `99.9% toString is not the answer`, is that comment specific to Charset API? For example, in Enum, its just the opposite, right? (https://stackoverflow.com/questions/18031125/what-is-the-difference-between-enum-name-and-enum-tostring) – lmsurprenant Aug 27 '19 at 14:02
  • @Imsurprenant if you read the rest of that answer and comments, you should still be using `.name()` in code. The documentation simply suggests that `.toString()` should have the most human readable form (partially because it can be overwritten to do so). – Roger Aug 28 '19 at 17:42
145

Now I use org.apache.commons.lang3.CharEncoding.UTF_8 constant from commons-lang.

yegor256
  • 102,010
  • 123
  • 446
  • 597
73

The Google Guava library (which I'd highly recommend anyway, if you're doing work in Java) has a Charsets class with static fields like Charsets.UTF_8, Charsets.UTF_16, etc.

Since Java 7 you should just use java.nio.charset.StandardCharsets instead for comparable constants.

Note that these constants aren't strings, they're actual Charset instances. All standard APIs that take a charset name also have an overload that take a Charset object which you should use instead.

JuanMoreno
  • 2,498
  • 1
  • 25
  • 34
Daniel Pryden
  • 59,486
  • 16
  • 97
  • 135
  • 3
    So, should be Charsets.UTF_8.name()? – AlikElzin-kilaka Mar 25 '13 at 08:31
  • 1
    @kilaka Yeah use name() instead of getDisplayName() since name() is final and getDisplayName() is not – RKumsher Feb 27 '14 at 18:40
  • Bad idea to use third party code that's constantly modified, breaking backwards compatibility, to accomplish something you can do with the standard SDK. – Buffalo Nov 13 '17 at 08:59
  • 3
    @Buffalo: Please read my answer again: it recommends using `java.nio.charset.StandardCharsets` when possible, which is not third party code. Additionally, the Guava Charsets definitions are not "constantly modified" and AFAIK have never broken backwards compatibility, so I don't think your criticism is warranted. – Daniel Pryden Nov 13 '17 at 16:37
  • We've had multiple issues when upgrading the Guava libraries. – Buffalo Nov 14 '17 at 09:33
  • 2
    @Buffalo: That's as it may be, but I doubt your issues had anything to do with the `Charsets` class. If you want to complain about Guava, that's fine, but this is not the place for those complaints. – Daniel Pryden Nov 14 '17 at 13:16
  • 1
    Please do not include a multi-megabyte library to get one string constant. – Jeffrey Blattman Oct 09 '18 at 23:06
  • 1
    "All standard APIs that take a charset name also have an overload that take a Charset object" is not quite true. One example is `java.net.URLEncoder.encode(String, String)`, which does not have an overload taking a `Charset` parameter. – Adam Rosenfield Feb 25 '19 at 06:29
50

In case this page comes up in someones web search, as of Java 1.7 you can now use java.nio.charset.StandardCharsets to get access to constant definitions of standard charsets.

cosjav
  • 2,095
  • 1
  • 17
  • 17
  • I have been trying to use this but it does not seem to work. 'Charset.defaultCharset());' seems to work after including 'java.nio.charset.*' but I can't seem to explicitly refer to UTF8 when I am trying to use 'File.readAllLines'. – Roger Apr 17 '13 at 06:54
  • 1
    @Roger What seems to be the problem? From what I can see you can just call: `Files.readAllLines(Paths.get("path-to-some-file"), StandardCharsets.UTF_8);` – cosjav May 06 '13 at 05:30
  • I don't know what the problem was, but it worked for me after changing something which I can't remember. – Roger May 31 '13 at 18:50
  • 1
    ^^^ You probably had to change the target platform in the IDE. If 1.6 was your latest JDK when you installed the IDE, it probably picked it as the default & kept it as the default long after you'd updated both the IDE and JDK themselves in-place. – Bitbang3r Nov 20 '13 at 19:03
10

This constant is available (among others as: UTF-16, US-ASCII, etc.) in the class org.apache.commons.codec.CharEncoding as well.

9

There are none (at least in the standard Java library). Character sets vary from platform to platform so there isn't a standard list of them in Java.

There are some 3rd party libraries which contain these constants though. One of these is Guava (Google core libraries): http://guava-libraries.googlecode.com/svn/trunk/javadoc/com/google/common/base/Charsets.html

tskuzzy
  • 35,812
  • 14
  • 73
  • 140
  • It took me a second to catch on to this... Guava's Charsets constants are (no surprise) Charsets, not Strings. InputStreamReader has another constructor that takes a Charset rather than a string. If you really need the string, it's e.g. Charsets.UTF_8.name(). – Ed Staub Jul 14 '11 at 19:11
  • 1
    Character sets do may vary from platform to platform, but UTF-8 is guaranteed to exist. – tar Mar 05 '14 at 08:27
  • 4
    All charsets defined in `StandardCharsets` are guaranteed to exist in every Java implementation on every platform. – Krzysztof Krasoń Apr 09 '16 at 08:54
8

In Java 1.7+

Do not use "UTF-8" string, instead use Charset type parameter:

import java.nio.charset.StandardCharsets

...

new InputStreamReader(new FileInputStream(file), StandardCharsets.UTF_8);
Mostafa Vatanpour
  • 1,328
  • 13
  • 18
8

You can use Charset.defaultCharset() API or file.encoding property.

But if you want your own constant, you'll need to define it yourself.

Andrew Tobilko
  • 48,120
  • 14
  • 91
  • 142
paulsm4
  • 114,292
  • 17
  • 138
  • 190
  • 11
    The default charset is usually determinded by the OS and locale settings, I don't think there is any guarantee that it remains the same for multiple java invocations. So this is no replacement for a constant sepcifying "utf-8". – Jörn Horstmann Jul 14 '11 at 21:43
5

If you are using OkHttp for Java/Android you can use the following constant:

import com.squareup.okhttp.internal.Util;

Util.UTF_8; // Charset
Util.UTF_8.name(); // String
JJD
  • 50,076
  • 60
  • 203
  • 339
  • 2
    it's removed from OkHttp, so next way is: `Charset.forName("UTF-8").name()` when you need support for lower Android than API 19+ otherwise you can use: `StandardCharsets.UTF_8.name()` – mtrakal Mar 06 '19 at 16:39
4

Constant definitions for the standard. These charsets are guaranteed to be available on every implementation of the Java platform. since 1.7

 package java.nio.charset;
 Charset utf8 = StandardCharsets.UTF_8;
Vazgen Torosyan
  • 1,255
  • 1
  • 12
  • 26
3

Class org.apache.commons.lang3.CharEncoding.UTF_8 is deprecated after Java 7 introduced java.nio.charset.StandardCharsets

  • @see JRE character encoding names
  • @since 2.1
  • @deprecated Java 7 introduced {@link java.nio.charset.StandardCharsets}, which defines these constants as
  • {@link Charset} objects. Use {@link Charset#name()} to get the string values provided in this class.
  • This class will be removed in a future release.
sendon1982
  • 9,982
  • 61
  • 44