2

Here is my code:

class Hello{
    public static void main(String[] arg)throws Exception{
        System.out.println("Hello");
        String str = "سلام";
            System.out.println(new String(str.getBytes("UTF-8")));
    }
}

Compiling as: javac -encoding UTF8 Hello.java The ouput is:

C:\Users\Windows\Desktop>java Hello
Hello
ط³ظ„ط§ظ…

chcp shows: Active code page: 65001

How can I display it accurately?

Best Regards

  • See if [this](https://stackoverflow.com/questions/57131654/using-utf-8-encoding-chcp-65001-in-command-prompt-windows-powershell-window) helps. – Federico klez Culloca Aug 11 '23 at 10:32
  • I can't reproduce it, I tried your code and it can be displayed correctly – 时间只会一直走 Aug 11 '23 at 10:33
  • 3
    Why are you taking a string, converting it to its UTF-8 bytes and back into a string (in potentially a different character encoding)? Why not just `System.out.println(str);`? – Andy Turner Aug 11 '23 at 10:38
  • @AndyTurner When I print only `str` then compiles but nothing is displayed – user987376746090 Aug 11 '23 at 10:41
  • @时间只会一直走 Have you please tried in windows command line? – user987376746090 Aug 11 '23 at 10:42
  • 2
    `new String(str.getBytes("UTF-8"))` will take your string, convert it to bytes using UTF-8, and then convert it back to a string using either UTF-8 (on modern JVMs, I think jdk17+), or using your platform local encoding. __If__ that is UTF-8, this is meaningless and produces the same thing as just `str`. If it's not UTF-8, you lose the game here, and your string is irrepairably gobbledygook. No matter what is happening, __do not do this part__. Your string is then converted back to byte[] a third time when sent to standard out. Your OS then converts it back to a string and prints it. – rzwitserloot Aug 11 '23 at 11:13
  • Just print `str`. If that prints nothing, go debug it. What does `System.out.println(System.getProperty("native.encoding"));` print? – rzwitserloot Aug 11 '23 at 11:15
  • @rzwitserloot it prints `null` – user987376746090 Aug 11 '23 at 11:36
  • @user then you're on an old JDK. Okay, the old version of that is `System.out.println(Charset.defaultCharset())`. – rzwitserloot Aug 11 '23 at 16:04
  • @rzwitserloot output is: `windows-1256` – user987376746090 Aug 11 '23 at 16:13
  • 1
    I don't think that can handle persian characters at all. No java code could possibly make it work. – rzwitserloot Aug 11 '23 at 16:57
  • @rzwitserloot then what should I do? May I upgrade my jdk to newer version or else? – user987376746090 Aug 11 '23 at 17:30
  • You face a [mojibake](https://en.wikipedia.org/wiki/Mojibake) case (*example in Python for its universal intelligibility*): `"سلام".encode('utf-8').decode('cp1256')` returns `'ط³ظ„ط§ظ…'`. I'd guess that `System.out.println("سلام");` or `System.out.println(str);` should work instead of `System.out.println(new String(str.getBytes("UTF-8")));`… – JosefZ Aug 11 '23 at 19:39
  • @JosefZ I tried `System.out.println("سلام");` but strange character is displayed – user987376746090 Aug 12 '23 at 06:09

1 Answers1

1

Your code is fine, but you have two problems:

  • You are using a font in the Command Prompt window which does not properly support Farsi characters.
  • Farsi is written from right to left, but the Command Prompt window is rendering text from left to right, so even if you use an appropriate font the Farsi text will be rendered in reverse.

Using your sample application, this is how the output was rendered in my Command Prompt window using Courier New font which supports Farsi characters:

Farsi from cmd window

Note that the characters were rendered correctly, but in reverse order.

There may be an easy workaround to get characters to render right to left text correctly in a Command Prompt window, but a better alternative approach in any case is to instead use the Command Prompt implementation provided by the Microsoft's Terminal utility.

If you do that then your application's output is rendered correctly without needing to make any configuration changes:

Terminal output

Notes:

  • My environment is Windows 10 using United States locale with a default code page of 65001. I tested using JDK 20, but any JDK version should be fine.
  • Unless you must use the traditional Command Prompt window for compatibility reasons, consider using the Command Prompt implementation provided by Terminal instead. All kinds of text rendering problems just disappear because some longstanding bugs/limitations with the traditional Command Prompt window have been resolved.
  • If you are not familiar with Terminal, this is Microsoft's overview description:

Windows Terminal is a modern host application for the command-line shells you already love, like Command Prompt, PowerShell, and bash (via Windows Subsystem for Linux (WSL)). Its main features include multiple tabs, panes, Unicode and UTF-8 character support, a GPU accelerated text rendering engine, and the ability to create your own themes and customize text, colors, backgrounds, and shortcuts.

skomisa
  • 16,436
  • 7
  • 61
  • 102