5

Finding the charset of System.out is tricky. (See Logback System.err output uses wrong encoding for discussion and implications with Logback.) Here's what the System.out API documentation says.

The "standard" output stream. This stream is already open and ready to accept output data. Typically this stream corresponds to display output or another output destination specified by the host environment or user. The encoding used in the conversion from characters to bytes is equivalent to Console.charset() if the Console exists, Charset.defaultCharset() otherwise.

On my Windows 10 Command Prompt using Java 17, Charset.defaultCharset() returns windows-1252, while System.console().charset() returns IBM437. If create a new OutputStreamWriter(System.out, System.console().charset()) and write the string "é", it produces é as expected. But if I use new OutputStreamWriter(System.out, Charset.defaultCharset()) and write "é", it produces Θ! That's why it is imperative that I use new OutputStreamWriter(System.out, System.console().charset()).

There's a wrinkle: as Baeldung explains, if stdout is being redirected (e.g. … > temp.out on the command line), then System.console() will be null and Charset.defaultCharset() should be used instead. Thus we can determine the charset used for System.out like this (which isn't the most efficient code, but it illustrates the point):

final Charset systemOutCharset = System.console() != null
    ? System.console().charset() : Charset.defaultCharset();

But here is the huge problem: System.err supposedly plays by the same rules: if there is a Console, then System.console().charset() is used as the System.err charset, otherwise Charset.defaultCharset().

The encoding used in the conversion from characters to bytes is equivalent to Console.charset() if the Console exists, Charset.defaultCharset() otherwise.

As we saw, if we redirect stdout, then there is no Console. What if we redirect stdout but not stderr? On my system, System.out will be writing (redirected) using the charset windows-1252, and System.err will still be printing to the console using the charset IBM437. This seems to directly contradict its API contract, as in the absence of Console System.err is using, not Charset.defaultCharset(), but instead some other charset (what Console.charset() would have returned if it were present). Moreover there's no longer any way to access the charset of System.err, because there is no System.console() because stdout is being redirected!

How can I discover the correct charset to use for System.err if stdout is being redirected? And why isn't System.err adhering to its API contract in this scenario?

I can only assume this was an oversight of the Java API and there should be a System.getConsoleCharset() method which would return the correct value whether or not System.console() is present.

This has larger implications than you might think. A logging system such as Logback (see LOGBACK-1642) is typically configured to send log output to stderr (see e.g. https://unix.stackexchange.com/q/331611), and logging packages are not going to require Java 18 (mentioned in an answer below) for years and years to come, as they implement cross-cutting functionality that must work with the lowest supported Java versions. Because of this Java bug (and it does seem to be a bug coupled with an API blind spot), there is no way for a logging system to know for certain which charset to use for its output if using stderr which is arguably best practice!

Garret Wilson
  • 18,219
  • 30
  • 144
  • 272
  • https://stackoverflow.com/questions/6172972/how-to-get-console-charset Or make it a configuration option till java 18. Or use ProcessBuilder with a Windows command. – Joop Eggen May 30 '22 at 14:33

1 Answers1

2

Java 18 has exactly what you need:

Class PrintStream

[...]


public Charset charset()
Returns the charset used in this PrintStream instance.

Returns:
the charset used in this PrintStream instance
Since:
18

Unfortunately, this means that there are only hacks out there for earlier Java versions - as such methods are usually only added because it is needed.

With that, the charset to use is simply

System.err.charset()

If you need to support earlier Java versions, you can use code like this:

import static java.lang.invoke.MethodType.methodType;

import java.io.OutputStreamWriter;
import java.io.PrintStream;
import java.lang.invoke.MethodHandle;
import java.lang.invoke.MethodHandles;
import java.lang.invoke.MethodHandles.Lookup;
import java.lang.reflect.Field;
import java.lang.reflect.InvocationTargetException;
import java.lang.reflect.UndeclaredThrowableException;
import java.nio.charset.Charset;

public final class PrintStreamCharset {
    
    private PrintStreamCharset() {}
    
    private static final MethodHandle PS_CHARSET;
    
    public static Charset charset(PrintStream ps) {
        try {
            return (Charset) PS_CHARSET.invokeExact(ps);
        } catch (Error | RuntimeException e) {
            throw e;
        } catch (Throwable t) {
            throw new UndeclaredThrowableException(t);
        }
    }
    
    static {
        MethodHandle mh;
        Lookup l = MethodHandles.lookup();
        try {
            try {
                mh = l.findVirtual(PrintStream.class, "charset", methodType(Charset.class));
            } catch (NoSuchMethodException ignored) {
                mh = makeMHPreJava18(l);
            }
        } catch (IllegalAccessException e) {
            throw new ExceptionInInitializerError(e);
        }
        PS_CHARSET = mh;
    }
    
    private static MethodHandle makeMHPreJava18(Lookup l) throws IllegalAccessException {
        Lookup pl = privateLookup(l);
        try {
            MethodHandle getOSW = pl.findGetter(PrintStream.class, "charOut",
                    OutputStreamWriter.class);
            MethodHandle getEncodingName = l.findVirtual(OutputStreamWriter.class, "getEncoding",
                    methodType(String.class));
            MethodHandle findCharset = l.findStatic(Charset.class, "forName",
                    methodType(Charset.class, String.class));
            return MethodHandles.filterReturnValue(
                    MethodHandles.filterReturnValue(getOSW, getEncodingName), findCharset);
        } catch (NoSuchFieldException | NoSuchMethodException e) {
            throw new Error(e);
        }
    }
    
    private static Lookup privateLookup(Lookup origin) throws IllegalAccessException {
        try {
            return (Lookup) MethodHandles.class
                    .getMethod("privateLookupIn", Class.class, Lookup.class)
                    .invoke(null, PrintStream.class, origin);
        } catch (InvocationTargetException e) {
            Throwable cause = e.getCause();
            if (cause instanceof Error) {
                throw (Error) cause;
            } else if (cause instanceof RuntimeException) {
                throw (RuntimeException) cause;
            } else if (cause instanceof IllegalAccessException) {
                throw (IllegalAccessException) cause;
            } else {
                throw new UndeclaredThrowableException(cause);
            }
        } catch (NoSuchMethodException nsme) {
            // Java 8 - get MethodHandles.Lookup.IMPL_LOOKUP
            Field implLookupField;
            try {
                implLookupField = Lookup.class.getDeclaredField("IMPL_LOOKUP");
            } catch (NoSuchFieldException nsfe) {
                nsme.addSuppressed(nsfe);
                throw new ExceptionInInitializerError(nsme);
            }
            implLookupField.setAccessible(true);
            Lookup implLookup = (Lookup) implLookupField.get(null);
            return implLookup.in(PrintStream.class);
        }
    }
}

This requires you to pass --add-opens java.base/java.io=<YOUR-MODULE-NAME> on the command line for Java 9 to Java 17.

With Java 18+ this will simply call PrintStream.charset().
With earlier Java versions, it will do some equivalent to Charset.forName(ps.charOut.getEncoding()).

I did test this with Java 8, Java 11, Java 17 and Java 18.
Considering that it uses an official API with Java 18 and Java 8 - Java 17 are only getting updates, this is somewhat unlikely to break in the future.

Johannes Kuhn
  • 14,778
  • 4
  • 49
  • 73
  • Well, that's nice to know. Too bad it won't be an a LTS until late next year. What can I use for my other projects still using Java 11? Or my libraries using Java 8 to appeal to a wider audience? – Garret Wilson May 30 '22 at 14:24
  • `Console.charset()` was added in Java 17. Before that? Hack into the `PrintStream.charset` field. – Johannes Kuhn May 30 '22 at 14:40
  • If the charset hasn't been changed, it will be the [same as this](https://docs.oracle.com/en/java/javase/16/docs/api/java.base/java/nio/charset/Charset.html#defaultCharset()) – g00se May 30 '22 at 14:49
  • Johannes, I hadn't realized `Console.charset()` was only added recently. So how do you propose to "hack into the `PrintStream.charset` field"? Through some reflection magic (which would surely break with modularization)? – Garret Wilson May 30 '22 at 14:51
  • Yes, reflection. Won't break for Java 8-17 (very unlikely that the field will change in those releases now), and then add `--add-opens java.base/java.io=` to the command line. – Johannes Kuhn May 30 '22 at 15:26