cyrillic in windows Console(java) System.out.println();

Question

When i write some cyrillic text, System.out.println("Русский язык") - then it outpus this ╨єёёъшщ ч√ъ, using windows console, how can be this fixed?, the file encoding is utf-8, but it doesn't matter, when it was ansii or windows-1251, it were outputing the same.

I don't believe the Windows console supports Unicode output... — aardvarkk, Apr 13 '12 at 15:44
Depending on how you execute your java code this articule might helps http://stackoverflow.com/questions/388490/unicode-characters-in-windows-command-line-how — VirtualTroll, Apr 13 '12 at 15:47

score 11 · Accepted Answer · answered Apr 13 '12 at 22:13

import java.io.PrintStream;
class Kyrill {
    public static void main(String args[])
        throws java.io.UnsupportedEncodingException
    {
        String ru = "Русский язык";
        PrintStream ps = new PrintStream(System.out, true, "UTF-8");
        System.out.println(ru.length());
        System.out.println(ru);
        ps.println(ru);
    }
}

D:\Temp :: chcp 65001
Aktive Codepage: 65001.

D:\Temp :: javac -encoding utf-8 Kyrill.java && java Kyrill
12
??????? ????
Русский языкй язык

Note that you might see some trailing junk in the output (I do) but if you redirect the output to a file you'll see that this is just a display artefact.

So you can make it work by using a PrintStream. The System.out uses the platform encoding (cp1252 for me), and that doesn't have cyrillic characters.

Additional note for you to grok the encoding business:

D:\Temp :: chcp 1251
Aktive Codepage: 1251.
:: This is another codepage (8 bits only) that maps bytes to cyrillic characters.
:: Edit the source file to have:
::      PrintStream ps = new PrintStream(System.out, true, "Windows-1251");
:: We intend to match the console output; else we won't get the expected result.
D:\Temp :: javac -encoding utf-8 Kyrill.java && java Kyrill
12
??????? ????
Русский язык

So you can see that contrary to what some people believe, the Windows console does grok Unicode in the casual sense that it can print Greek and Russian.

I copied the first part of your code and did exactly as you said, but the output is still a mess. So what you saw seems not a universal truth for all windows consoles. Note: I am using windows 7 64 bits, professional, English version. — dragon66, Apr 13 '12 at 22:47
@dragon66 - Did you use UTF-8 as the source file encoding? If not, that's the reason for your mess. Else, define "mess". — Lumi, Apr 13 '12 at 22:58
Yes, I did and these are the mess I saw: C:\work-temp>javac -encoding utf-8 Kyrill.java C:\work-temp>javac -encoding utf-8 Kyrill.java && java Kyrill 12 ??????? ???? ╨á╤â╤ü╤ü╨║╨╕╨╣ ╤Å╨╖╤ï╨║ — dragon66, Apr 13 '12 at 23:36
It looks like you missed the `chcp 65001` command to switch your console window to UTF-8. — Lumi, Apr 14 '12 at 08:43

score 2 · Answer 2 · answered Apr 13 '12 at 19:24

Although you can switch Windows console to UTF-8 by chcp 65001, you may still not be able to view UTF-8 output properly. This may not be what you want, but it at least is a choice: redirect your standard output to a file. Save your source file as UTF-8 and compile it using UTF-8 encoding. The redirected output file can be viewed with a UTF-8 aware text editor.

String s = "Русский язык";
System.setOut(new PrintStream(new FileOutputStream("out.txt"), true, "UTF-8"));
System.out.println(s);

score 1 · Answer 3 · answered Apr 13 '12 at 19:32

1

Windows console uses encoding CP866 for Cyrillic, for historical reasons (remember DOS?). Windows console is definitely not Unicode-capable.

(Alas, I have no Windows machine around to provide a tested code snippet.)

answered Apr 13 '12 at 19:32

9000

39,899
9
66
104

This is wrong. First, `cmd.exe` is not "DOS", and implying it is related is misleading. Second, it is "Unicode-capable" in the way in which the OP is trying to make it work. – Lumi Apr 13 '12 at 22:15
@Lumi: (1) DOS of old 16-bit days used the encoding known now as CP866. Modern 32- and 64-bit Windows have to follow the suit, despite using another 8-bit Cyrillic encoding (CP1251) in the GUI part, and proper Unicode in Unicode API. (2) 'Unicode-capable' would be capable of at least displaying all of Base Plane characters; Windows console only supported 8-bit encodings last time I checked. If this has changed, it's welcome news. – 9000 Apr 13 '12 at 23:37
This sounds correct. The welcome news is simply ["65001 utf-8 Unicode (UTF-8)"](http://msdn.microsoft.com/en-us/library/windows/desktop/dd317756.aspx). Full BMP support means many, many languages, also right-to-left ones like Arabic, and that might require a different version of `cmd.exe`, and of course you need to have an appropriate console font installed, and I'm not sure it even exists, probably yes, but I've never seen it. – Lumi Apr 14 '12 at 08:41

cyrillic in windows Console(java) System.out.println();

3 Answers3

Linked

Related