0

What I do:

public class Main {
    public static void main(String[] args) {
        char i = 0x25A0;
        System.out.println(i);
        i = 0x2612;
        System.out.println(i);
        i = 0x2610;
        System.out.println(i);
    }
}

What I get in IDE: What I get in IDE

What I get in Windows console: What I get in Windows console

I have Windows 10 (Russian locale), Cp866 default coding in console, UTF-8 coding in IDE. How to make characters in console look correct?

user0856
  • 3
  • 1
  • 4

3 Answers3

3

Two problems here, actually:

  1. Java converts output to its default encoding which doesn't have anything to do with the console encoding, usually. This can apparently only be overridden at VM startup with, e.g.

    java -Dfile.encoding=UTF-8 MyClass
    
  2. The console window has to use a TrueType font in order to display Unicode. However, neither Consolas, nor Lucida Console have ☐, or ☒. So they show up as boxes with Lucida Console and boxes with a question mark with Consolas (i.e. the missing glyph glyph). The output is still fine, you can copy/paste it easily, it just doesn't look right, and since the Windows console doesn't use font substitution (hard to do that with a character grid anyway), there's little you can do to make them show up.

I'd probably just use [█], [ ], and [X] instead.

Joey
  • 344,408
  • 85
  • 689
  • 683
  • I tried -Dfile.encoding=UTF-8 and -Dfile.encoding=Cp866 before, none hepled. Thanks for ideas. – user0856 Jan 05 '17 at 10:03
  • Well, Cp866 would not do any good, since the characters are not in that codepage – Joey Jan 05 '17 at 10:20
  • Ok, the main problem seems to be in absence of characters in fonts. There is still a quiestion why its ok in IDE. I have an idea that it is because Windows uses different encodings for graphical and console outputs, but Im not sure – user0856 Jan 05 '17 at 11:09
  • Your IDE probably doesn't treat console output as a grid of character cells, but rather a text stream (as is usual in Unix, but not Windows) and displays it accordingly. Thus it probably can simply use normal text output and thus font substitution. – Joey Jan 05 '17 at 11:43
1

Cp866 default coding in console

well yeah. Code page 866 doesn't include characters U+25A0, U+2610 or U+2612. So even if Java were using the correct encoding for the console (either because you set something like -Dfile.encoding=cp866, or it guessed the right encoding, which it almost never manages), you couldn't get the characters out.

How to make characters in console look correct?

You can't.

In theory you could use -Dfile.encoding=utf-8, and set the console encoding to UTF-8 (or near enough, code page 65001). Unfortunately the Windows console is broken for multi-byte encodings (other than the legacy locale-default supported ones, which UTF-8 isn't); you'll get garbled output and hangs on input. This approach is normally unworkable.

The only reliable way to get Unicode to the Windows console is to skip the byte-based C-standard-library I/O functions that Java uses and go straight to the Win32 native WriteConsoleW interface, which accepts Unicode characters (well, UTF-16 code units, same as Java strings) and so avoids the console bugs in byte conversion. You can use JNA to access this API—see example code in this question: Java, UTF-8, and Windows console though it takes some extra tedious work if you want to make it switch between console character output and regular byte output for command piping.

And then you have to hope the user has non-raster fonts (as @Joey mentioned), then then you have to hope the font has glyphs for the characters you want (Consolas doesn't for U+2610 or U+22612). Unless you really really have to, getting the Windows console to do Unicode is largely a waste of your time.

Community
  • 1
  • 1
bobince
  • 528,062
  • 107
  • 651
  • 834
  • Good point, I actually forgot that Java uses the non-wide C API underneath which is in itself a fun source of problems ;-) – Joey Jan 05 '17 at 10:43
  • @Joey: yes, and it's worse than that: even using the wide C stdlib API is broken by default. You can sort-of fix it up with `_setmode(stream, _O_U8TEXT)`, but then you can't ever use the narrow stream APIs as they crash. I imagine Java would probably not be happy. :-) this is all such a horrendous disaster – bobince Jan 05 '17 at 10:58
0

Are you sure, that the font you use, has characters to display the Unicode? No font supports every possible Unicode character. U+9744,9632 and 9746 are not supported by e.g. the Arial font. You can Change the font of your IDE console and your Windows console too.

GAlexMES
  • 335
  • 2
  • 20
  • I am not completely sure that font has characters, but I used same fonts "Consolas" and "Lucida Console" in IDE and console, and every time it was correct in IDE and incorrect in console. So I suppose the problem is encoding. – user0856 Jan 05 '17 at 08:54
  • Please correct your code example. You are using different Unicodes, than in the Pictures you appended. Back to the Topic: Do you try to change the codepage of your console? Use the command **chcp 65001** to Change your codepage to 65001. Then try to type ALT + 02610 (on numpad). Is the Icon visible? – GAlexMES Jan 05 '17 at 09:18
  • Updated code example, same resulta in IDE and console. Tried chcp 65001 and ALT + 02610, no icon shown in console, just number "2". – user0856 Jan 05 '17 at 09:38
  • Since they get question marks this is (at first at least) *not* a font problem but rather an encoding conversion mangling the output. – Joey Jan 05 '17 at 09:39
  • I did a bit of research. It seems like Java is resetting the console Codepage. Have a look at this question: http://stackoverflow.com/questions/8669056/unicode-input-in-a-console-application-in-java – GAlexMES Jan 05 '17 at 09:43