Is UTF-8 the default encoding in Java?
If not, how can I know which encoding is used by default?

- 302,674
- 57
- 556
- 614

- 3,217
- 5
- 24
- 20
-
If a docker image does not have ENV LANG=en_US.UTF-8 you can see *very* confusing behavior where "locale" is POSIX on startup but if you exec into the container it shows UTF-8. Best not to rely on file.encoding, always specify the encoding explicitly when creating a stream. – jamshid Jan 18 '23 at 20:48
7 Answers
The default character set of the JVM is that of the system it's running on. There's no specific value for this and you shouldn't generally depend on the default encoding being any particular value.
It can be accessed at runtime via Charset.defaultCharset()
, if that's any use to you, though really you should make a point of always specifying encoding explicitly when you can do so.

- 102,507
- 33
- 189
- 228
-
7If you are correct I find it a bit strange http://java.sun.com/javase/technologies/core/basic/intl/faq.jsp#text-representation says that it's always UTF-16. – Jonas Elfström Nov 03 '11 at 16:11
-
43UTF-16 is how text is represented internally in the JVM. The default encoding determines how the JVM interprets bytes read from files (using `FileReader`, for example). – JesperE Jan 12 '12 at 12:30
-
8This answer is correct, but for reference, on Linux it's usually "UTF-8", and on Windows it's usually "cp1252". – Jeutnarg Jan 22 '16 at 19:31
-
I have just experienced an linux installation that report UTF-8 from locale, but java says US-ASCII. – Gunslinger Jan 26 '17 at 09:02
-
1Wrong. Check `Charset.defaultCharset()` source code. It reads `file.encoding` property, otherwise uses UTF-8. – Artem Novikov Mar 28 '18 at 12:27
-
@JesperE : "text is represented internally in the JVM" : you mean bytecode ? – Rahul Jun 18 '18 at 16:32
-
@Rahul No, I mean how text is represented *in memory* in the JVM. Not sure what the bytecode spec says, I was referring to how the JVM stores text in memory at runtime. At least I think so, but my comment was made 6 years ago, so I might misremember. – JesperE Jun 21 '18 at 07:35
Note that you can change the default encoding of the JVM using the confusingly-named property file.encoding
.
If your application is particularly sensitive to encodings (perhaps through usage of APIs implying default encodings), then you should explicitly set this on JVM startup to a consistent (known) value.

- 268,207
- 37
- 334
- 440
-
19Note that `file.encoding` must be specified on JVM startup (i.e. as cmdline parameter -Dfile.encoding or via JAVA_TOOLS_OPTIONS); you can set it at runtime, but it will not matter. See http://stackoverflow.com/questions/361975/setting-the-default-java-character-encoding – sleske Feb 25 '10 at 12:38
There are three "default" encodings:
file.encoding:
System.getProperty("file.encoding")
java.nio.Charset:
Charset.defaultCharset()
And the encoding of the InputStreamReader:
InputStreamReader.getEncoding()
You can read more about it on this page.

- 35,493
- 19
- 190
- 259

- 948
- 10
- 14
I am sure that this is JVM implemenation specific, but I was able to "influence" my JVM's default file.encoding by executing:
export LC_ALL=en_US.UTF-8
(running java version 1.7.0_80 on Ubuntu 12.04)
Also, if you type "locale" from your unix console, you should see more info there.
All the credit goes to http://www.philvarner.com/2009/10/24/unicode-in-java-default-charset-part-4/
-
How did you check it? I can't find a proof Java pays any attention to the encoding in the locale string. Only from `file.encoding` property. – Artem Novikov Mar 28 '18 at 13:19
-
3@ArtemNovikov - yes, but what is the default value of `file.encoding`? It's initialised in `java.lang.System.initProperties` based on the value of `sprops.encoding`, where `sprops` is a structure returned by the native function `GetJavaProperties()`, the implementation of which varies according to platform. In the Windows version, for example, it calls `GetUserDefaultLCID()` and then `GetLocaleInfo (lcid, LOCALE_IDEFAULTANSICODEPAGE, ...)` to find the user's default ANSI code page and uses that. On Unix platforms, it parses the return of `setlocale(LC_CTYPE, NULL)`. – Jules May 25 '18 at 19:01
-
... See http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/native/java/lang/System.c#l169 and http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/solaris/native/java/lang/java_props_md.c#l427 for details. – Jules May 25 '18 at 19:05
You can use this to print out the JVM defaults
import java.nio.charset.Charset;
import java.io.InputStreamReader;
import java.io.FileInputStream;
public class PrintCharSets {
public static void main(String[] args) throws Exception {
System.out.println("file.encoding=" + System.getProperty("file.encoding"));
System.out.println("Charset.defaultCharset=" + Charset.defaultCharset());
System.out.println("InputStreamReader.getEncoding=" + new InputStreamReader(new FileInputStream("./PrintCharSets.java")).getEncoding());
}
}
Compile and Run
javac PrintCharSets.java && java PrintCharSets

- 2,603
- 1
- 27
- 40
It's going to be locale-dependent. Different locale, different default encoding.

- 398,947
- 96
- 818
- 769