Finding the charset of System.out
is tricky. (See Logback System.err output uses wrong encoding for discussion and implications with Logback.) Here's what the System.out
API documentation says.
The "standard" output stream. This stream is already open and ready to accept output data. Typically this stream corresponds to display output or another output destination specified by the host environment or user. The encoding used in the conversion from characters to bytes is equivalent to
Console.charset()
if theConsole
exists,Charset.defaultCharset()
otherwise.
On my Windows 10 Command Prompt using Java 17, Charset.defaultCharset()
returns windows-1252
, while System.console().charset()
returns IBM437
. If create a new OutputStreamWriter(System.out, System.console().charset())
and write the string "é"
, it produces é
as expected. But if I use new OutputStreamWriter(System.out, Charset.defaultCharset())
and write "é"
, it produces Θ
! That's why it is imperative that I use new OutputStreamWriter(System.out, System.console().charset())
.
There's a wrinkle: as Baeldung explains, if stdout
is being redirected (e.g. … > temp.out
on the command line), then System.console()
will be null
and Charset.defaultCharset()
should be used instead. Thus we can determine the charset used for System.out
like this (which isn't the most efficient code, but it illustrates the point):
final Charset systemOutCharset = System.console() != null
? System.console().charset() : Charset.defaultCharset();
But here is the huge problem: System.err
supposedly plays by the same rules: if there is a Console
, then System.console().charset()
is used as the System.err
charset, otherwise Charset.defaultCharset()
.
The encoding used in the conversion from characters to bytes is equivalent to
Console.charset()
if theConsole
exists,Charset.defaultCharset()
otherwise.
As we saw, if we redirect stdout
, then there is no Console
. What if we redirect stdout
but not stderr
? On my system, System.out
will be writing (redirected) using the charset windows-1252
, and System.err
will still be printing to the console using the charset IBM437
. This seems to directly contradict its API contract, as in the absence of Console
System.err
is using, not Charset.defaultCharset()
, but instead some other charset (what Console.charset()
would have returned if it were present). Moreover there's no longer any way to access the charset of System.err
, because there is no System.console()
because stdout
is being redirected!
How can I discover the correct charset to use for System.err
if stdout
is being redirected? And why isn't System.err
adhering to its API contract in this scenario?
I can only assume this was an oversight of the Java API and there should be a System.getConsoleCharset()
method which would return the correct value whether or not System.console()
is present.
This has larger implications than you might think. A logging system such as Logback (see LOGBACK-1642) is typically configured to send log output to stderr
(see e.g. https://unix.stackexchange.com/q/331611), and logging packages are not going to require Java 18 (mentioned in an answer below) for years and years to come, as they implement cross-cutting functionality that must work with the lowest supported Java versions. Because of this Java bug (and it does seem to be a bug coupled with an API blind spot), there is no way for a logging system to know for certain which charset to use for its output if using stderr
which is arguably best practice!