3

I am trying to catch the Chinese Characters from Scanner.

I am running in cmd.exe on Windows 10.

I've already run CHCP 65001.

enter image description here

here is the code.

import java.util.Scanner;

public class JavaScannerCN
{
  public static void main (String[] args)
  {
    Scanner scanner = new Scanner(System.in);
    System.out.print("中文 user name: ");
    String username = scanner.next();
    scanner.close();
    System.out.println(String.format("Hello, %s", username));
    System.out.println((int)'中');
    System.out.println((int)username.charAt(0));
  }
}

thanks to @ParkerHalo reminder, it seems the character received has been damaged, since the output is

Hello, ��
20013
96

This code outputs the Chinese Characters that's inside code properly, while outputs the Chinese Characters catched from Scanner as junk.

How to fix that?

yigre20cn
  • 115
  • 1
  • 1
  • 6

1 Answers1

0

There are a few things that could cause this problem. One of them that charset in your windows environment. I would suggest a diagnostic tool that may help you to diagnose your problem and in turn, will lead you to an appropriate solution. There is an Open Source java library MgntUtils that has a Utility that converts Strings to unicode sequence and vise versa:

result = "Hello World";
result = StringUnicodeEncoderDecoder.encodeStringToUnicodeSequence(result);
System.out.println(result);
result = StringUnicodeEncoderDecoder.decodeUnicodeSequenceToString(result);
System.out.println(result);

The output of this code is:

\u0048\u0065\u006c\u006c\u006f\u0020\u0057\u006f\u0072\u006c\u0064
Hello World

The library can be found at Maven Central or at Github It comes as maven artifact and with sources and JavaDoc. Here is JavaDoc for the class StringUnicodeEncoderDecoder.

I suggest that you convert your input string that you receive from a scanner into Unicode sequences, and it will help you to understand if you received already damaged info or your problem is a display problem.

Michael Gantman
  • 7,315
  • 2
  • 19
  • 36
  • Thanks for your approach! It seems what I received has already been damaged, how to fix that? – yigre20cn Dec 10 '19 at 08:50
  • I found this question that may help: https://stackoverflow.com/questions/1259084/what-encoding-code-page-is-cmd-exe-using – Michael Gantman Dec 10 '19 at 09:07
  • try to run command "chcp 65001" on your console before you run your program. That will switch it to UTF-8 charset and that should resolve your issue – Michael Gantman Dec 10 '19 at 09:11