1

My code:

class Test{
    public static void main(String[]args){
        char c = 'П';
        System.out.print(c);
    }
}

All I want is to display that character on windows cmd (which seems to be over-complicated issue for cmd as I've tried many different ways but could not succeed).

  1. I tried a straightforward way: javac Test.java, comiler throws this:

    Test.java:3: error: unclosed character literal char c = 'П';

  2. I tried javac -encoding UTF-8 Test.java. It compiles but the character does not appear in cmd.

  3. I tried to save Test.java with unicode and typed javac -encoding UTF-16 Test.java but the character still does not appear.

Also, I should use plain windows notepad and cmd only. Please help, I am struggling with this issue 2 days :(

  • 2
    I believe you need to run java with `-Dfile.encoding=UTF-8` and change the code page of `cmd` to Unicode with `chcp 65001`. See https://stackoverflow.com/questions/1259084/what-encoding-code-page-is-cmd-exe-using – David Conrad Jul 30 '19 at 18:45
  • 1
    It's a bad idea to change the console's input and output codepages to UTF-8 via `chcp.com 65001`. It's buggy for output prior to Windows 8, and breaks non-ASCII input even in Windows 10. Plus child processes may misbehave if they expect the console to use the OEM codepage. It's best to just leave it at whatever the user configures and use the console's Unicode API. I haven't seriously used Java in many years, but I'd be surprised if it doesn't have a way to call the console's `WriteConsoleW` and `ReadConsoleW` functions that operate on UTF-16LE strings. – Eryk Sun Jul 30 '19 at 18:56
  • @DavidConrad Oh my God it worked! Thank you very much! Add your comment as an answer I will mark it as correct. – Mukhamedali Zhadigerov Jul 30 '19 at 19:07
  • 1
    @eryksun It is possible to call WriteConsoleW, but only through JNI: https://stackoverflow.com/a/8921509/636009 – David Conrad Jul 30 '19 at 19:09

1 Answers1

2

Change the encoding cmd uses to UTF-8 with chcp 65001 and then run your Java program with Java's file encoding set to UTF-8:

java -Dfile.encoding=UTF-8 Test
David Conrad
  • 15,432
  • 2
  • 42
  • 54
  • 2
    Note that chcp.com changes both the console input and output codepages. (CMD is just a shell that uses a console, and uses its Unicode API. Console applications in general have nothing to do with CMD.) Because it changes the input codepage to 65001, all applications attached to the console that read input via `ReadFile` or `ReadConsoleA` are limited to 7-bit ASCII due to a bug in the console host (conhost.exe), which assumes only 1 byte per character when encoding the console's interal UTF-16 input buffer for a non-Unicode client, but UTF-8 is 1-4 bytes and 1 byte only for ASCII. – Eryk Sun Jul 30 '19 at 19:45
  • 1
    Also, prior to Windows 8, applications that write via `WriteFile` or `WriteConsoleA` are returned the number of UTF-16 codes (1-2 codes per character) written instead of UTF-8 bytes written (1-4 bytes per character), which causes buffered writers to retry the part of the write that appears to have not been written. This looks like a sequence of garbage characters printed after writes that have non-ASCII characters. – Eryk Sun Jul 30 '19 at 19:46
  • 1
    @eryksun This is why I prefer to use mintty and a bash shell even on Windows. Cygwin and mintty don't have these problems. Maybe Windows 11 will be production ready. :) – David Conrad Jul 30 '19 at 21:34