18

I have a program which runs on a console and its Umlauts and other special characters are being output as ?'s on Macs. Here's a simple test program:

public static void main( String[] args ) {
    System.out.println("höhößüä");
    System.console().printf( "höhößüä" );
}

On a default Mac console (with default UTF-8 encoding), this prints:

 h?h????
 h?h????

But after manually setting the Mac terminal's encoding to "Mac OS Roman", it correctly printed

 höhößüä
 höhößüä

Note that on Windows systems using System.console() works:

 h÷h÷▀³õ
 höhößüä

So how do I make my program...rolleyes..."run everywhere"?

Epaga
  • 38,231
  • 58
  • 157
  • 245

2 Answers2

13

Try the following command-line argument when starting your application:

-Dfile.encoding=utf-8

This changes the default encoding of the JVM for I/O operations.

You can also try:

System.setOut(new PrintStream(System.out, true, "utf-8"));
Bozho
  • 588,226
  • 146
  • 1,060
  • 1,140
  • 1
    problem with this is that I'm shipping this console program as a jar, so I'm looking for a solution that doesn't involve the user having to add his own command line arguments. – Epaga Mar 10 '10 at 09:26
  • 1
    @Epaga - use a bat file or some wrapper for the jar? – Bozho Mar 10 '10 at 09:42
  • 1
    -Dfile.encoding=utf-8 is not a standard option and should be avoided. – n0rm1e Nov 18 '11 at 04:10
  • @Bozho The second option helped for being able to print utf-8 (hebrew) to my console. However, my ultimate problem is mapping of servlet url patterns. When using hebrew urls in the url patterns the mappings are displayed (I checked on jconsole MBeans) as gibberish. I thought that fixing the console printing would also fix the mappings but it doesn't. Do you have any other suggestions? – theyuv Mar 09 '16 at 13:04
10

Epaga: have a look right here. You can set the output encoding in a printstream - just have to determine or be absolutely sure about which is being set.

import java.io.PrintStream;
import java.io.UnsupportedEncodingException;

public class Test {
    public static void main (String[] argv) throws UnsupportedEncodingException {
    String unicodeMessage =
    "\u7686\u3055\u3093\u3001\u3053\u3093\u306b\u3061\u306f";

    PrintStream out = new PrintStream(System.out, true, "UTF-8");
    out.println(unicodeMessage);
  }
}

To determine the console encoding you could use the system command "locale" and parse the output which - on a german UTF-8 system looks like:

LANG="de_DE.UTF-8"
LC_COLLATE="de_DE.UTF-8"
LC_CTYPE="de_DE.UTF-8"
LC_MESSAGES="de_DE.UTF-8"
LC_MONETARY="de_DE.UTF-8"
LC_NUMERIC="de_DE.UTF-8"
LC_TIME="de_DE.UTF-8"
LC_ALL=
gamma
  • 1,902
  • 1
  • 20
  • 40
  • 2 problems: 1) how do I know which encoding is set in the console? 2) is there any way to still work with System.console() ? – Epaga Mar 10 '10 at 09:27
  • No, you cannot retain your code using `System.console()`, and I don't perceive it critical. `PrintStream` also has everything you want and it *does* work properly. +1 for the answer from me. – dimitarvp Mar 10 '10 at 09:35
  • 1) You could use the system command "locale" and parse its output. – gamma Mar 10 '10 at 09:36