6

I have got a problem with printing out a unicode symbol in the windows console.

Here's the java code that prints out the unicode symbol value;

System.out.print("\u22A2 ");

The problem doesn't exist when I run the program in Eclipse with encoding settings as UTF-8, however when it comes to windows console the symbol gets replaced by a question mark.

The following was done to try overcome this problem, with no success;

  • Change the font of windows console to Lucida Console.

  • Every time I run windows console I will change the encoding settings, i.e. with the use of chcp 65001

An extra step I've tried a few times was running the java file with an argument, i.e. java -Dfile.encoding=UTF-8 Filter (where "Filter" is name of the class)

Erwin Bolwidt
  • 30,799
  • 15
  • 56
  • 79
Adrian
  • 215
  • 1
  • 6
  • 14
  • are you sure the console's running in unicode? could be win-1252 or something. – Marc B Dec 04 '13 at 21:28
  • 1
    I'm guessing you've already read this. http://stackoverflow.com/questions/8669056/unicode-input-in-a-console-application-in-java – Georgian Dec 04 '13 at 21:30
  • I've got no idea of how would I check it, I've seen a screenshot of somebody's console where in his Options he would have information about what encoding he uses, however mine does not show it. – Adrian Dec 04 '13 at 21:31
  • @GGrec nope I didn't since it's to do with input, I didn't came across it – Adrian Dec 04 '13 at 21:38
  • 1
    The MS C runtime doesn't support UTF-8; even if you chcp to 65001 in the console you will likely hit app-breaking bugs. There is no reliable way to get Unicode stdout to the Windows console. If you absolutely must, there is the Win32 API `WriteConsoleW`, but it obviously only works on Windows, it needs careful handling of detecting whether you're actually talking to the Windows console, some other console, or a file or pipe, and you can't call it in pure Java (you need [JNA](http://illegalargumentexception.blogspot.co.uk/2009/04/java-unicode-on-windows-command-line.html)). – bobince Dec 08 '13 at 13:43

3 Answers3

9

By default, the code-page using in the CMD of Windows is 437. You can test by run this command in the prompt:

C:\>chcp
Active code page: 437

And, this code-page prevent you from showing Unicode characters properly! You have to change code page to 65001 AND using -Dfile.encoding=UTF-8 for that purpose.

C:\>chcp 65001
Active code page: 65001
C:\>java -jar -Dfile.encoding=UTF-8 path/to/your/runnable/jar
spider
  • 91
  • 1
  • 1
6

In additions to the steps you have taken, you also need a PrintStream/PrintWriter that encodes the printed characters to UTF-8.

Unfortunately, Java designers have chosen to open the standard streams with the so called "default" encoding, which is almost always unusable*) under Windows. Hence, using System.out and System.err naively will make your program output appear differently, depending on where you run it. This is straight against the goal: compile once, run anywhere.

*) It will be some non standard "code page" nobody except Microsoft recognizes on this planet. And AFAIK, if for example you have a German keyboard and a "German" OEM Windows and you want to have date and time in your home time zone, there is just no way to say: But I want UTF-8 input/output in my CMD window. This is one reason why I have my dual Ubuntu booted most of the time, where it goes without saying that the terminal does UTF-8.

The following usually works for me in JDK7:

public static PrintWriter stdout = new PrintWriter(
    new OutputStreamWriter(System.out, StandardCharsets.UTF_8),
    true);

For ancient Java versions, I replace StandardCharsets.UTF_8 by Charset.forName("UTF-8")

Erwin Bolwidt
  • 30,799
  • 15
  • 56
  • 79
Ingo
  • 36,037
  • 5
  • 53
  • 100
  • Thanks for your reply, I have given it a go with a PrintStream however it doesn't seem to solve the problem. Perhaps it's me doing something wrong but here's what I've done; PrintStream sysout = new PrintStream(System.out, true, "UTF-8"); sysout.print("\u22A2 "); Once again it works fine in Eclipse, but it doesn't in windows console – Adrian Dec 04 '13 at 22:36
  • Look at my edit, @Adrian. It should then print your string as ⊢ – Ingo Dec 04 '13 at 22:50
  • I really appreciate your help, but I have got no idea why it is not working already.. I have followed your solutions very closely, tried both charSets and it isn't just working.. I do always try all the options i could think of but ehm.. I even compiled the java files with encoding, using javac -encoding utf8 *.java – Adrian Dec 04 '13 at 23:14
  • Last time I [tried this solution](http://illegalargumentexception.blogspot.co.uk/2009/04/i18n-unicode-at-windows-command-prompt.html#charsets_javaconsole) too many characters were printed on the console (Windows XP.) The best luck I've had is [piping STDOUT through another application](http://illegalargumentexception.blogspot.co.uk/2013/06/go-unicode-on-windows-command-prompt.html). – McDowell Dec 13 '13 at 12:39
-2

For the Arabic language I used the following code:

PrintWriter stdout = new PrintWriter(
new OutputStreamWriter(System.out,StandardCharsets.ISO_8859_1),true);
Tim Visée
  • 2,988
  • 4
  • 45
  • 55
Hatem Badawi
  • 536
  • 4
  • 9