2

I was wondering if I can print out a string with Japanese characters. I stopped a mini-project that was, at first, out of my league. But as my skills and curiosity of high-level languages improved, I stumbled across my old project. But even with breaks from coding, I still wondered if it was possible. This isn't my project by any stretch (in fact, if the example given is non-applicable to programming, I'll feel stupid for the mere attempt.)

public static void main(String[] args) {
    // TODO code application logic here
    //Example:
    System.out.println("Input English String Here... ");
    Scanner english = new Scanner(System.in);
    String English = english.next();
    System.out.println("今、漢字に入ります。 ");
    Scanner japanese = new Scanner(System.in);
    String Japanese = japanese.next();
    System.out.println("Did it work...? ");
    System.out.println(English);
    System.out.println(Japanese);
}

run:

Input English String Here...
Good
今、漢字に入ります。
いい
Did it work...? 
Good
??

I expect to see いい on the last line of output.

user2864740
  • 60,010
  • 15
  • 145
  • 220
user130110
  • 55
  • 6
  • Probably a console problem. What are you running it in? The Windows Command Prompt is utterly broken at Unicode for any app (including Java) not exclusively using the Win32-specific APIs. – bobince May 13 '14 at 10:17

3 Answers3

3

The most likely explanation for getting ?? instead of いい is that there is a mismatch between the character encoding that is being delivered by your computer's input system, and the default Java character encoding determined by the JVM.

Assuming that the input is UTF-8 encoded, then a more reliable way to configure the scanner is new Scanner(System.in, "UTF-8").

Also note that it is not necessary to create multiple scanner objects. You can ... and should ... create one and reuse it. It probably will not matter if the input is genuinely interactive, but if there is any possibility that input could be piped to the program, you could find that the first Scanner gobbles up input that should go to the second Scanner.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • Thank you for your input. To be honest, I don't know why I inserted two Scanners, I don't do that normally. The problem still exist with two question marks. I'm using the IDE "NetBeans 8.0" now. When I started my project months ago, I was using Eclipse Juno. In Eclipse, I used UTF-8. However, it seemed to call different characters like an alpha and lambda (or something like that). – user130110 May 13 '14 at 03:40
2

If you are using eclipse you can change the default character encoding under run->run configurations -> common.

Also it would be better to use Scanner(System.in,StandardCharsets.UTF_8.displayName()) instead of a hard coding a string value.

Here is a link to another topic about the changing the default encoding for net beans: How to change file encoding in NetBeans?

Community
  • 1
  • 1
user3624390
  • 134
  • 10
-1

Support for Japanese in fonts is spotty, and different between AWT and Swing components. Those funny blobs probably mean you are using a font/component combination that doesn't have japanese glyphs.

Another possibility is if you've been manipulating the characters of the string, by passing them through byte arrays or integers, it's easy to accidentally lose high order bits. There are several deprecated APIs because of this hazard.

ddyer
  • 1,792
  • 19
  • 26
  • This is not an answer for this question/code. In the test-case shown there is no AWT/Swing or "manipulation/byte arrays" and the Japanese text in a string literal is rendered correctly. – user2864740 May 13 '14 at 03:46
  • My point is that the entire data chain from System.in to System.out is suspect, and java's included classes are not perfect. When the final output isn't correct, it can be very difficult to determine where it went wrong. – ddyer May 13 '14 at 16:35
  • Consider a comment for such concerns, especially when purposefully choosing to ignore the problematic case presented. – user2864740 May 13 '14 at 18:23