1

I want to input strings that may contain the letters åäö in Java, but Scanner converts them to some other character. I tried with utf-8 too:

String s1 = new Scanner(System.in).nextLine();
String s2 = new Scanner(System.in, "utf-8").nextLine();
System.out.println(s1 + "|" + (int)s1.charAt(0));
System.out.println(s2 + "|" + (int)s2.charAt(0));
System.out.println((int)'å' + "|" + (int)'?');

This yields:

å
å
?|8224
?|65533
229|63

All characters become 65533 with utf-8. Without utf-8, ä becomes 8222, ö becomes 8221, Å becomes 65533, Ä becomes 381, Ö becomes 8482.

Is there some alternative input method that allows for åäö?

I'm running java 8u25 and I'm running the program from the windows console.

H.v.M.
  • 1,348
  • 3
  • 16
  • 42

3 Answers3

5

The problem is not with Java, but with the windows console, which uses its own encoding. You can get it to display using the chcp command. Most likely it will be Codepage 850. In Java, you can then use this like

new Scanner(System.in, "Cp850")
Michael Borgwardt
  • 342,105
  • 78
  • 482
  • 720
1

You need to set the encoding for your output stream (see this thread):

String s1 = new Scanner(System.in).nextLine();
String s2 = new Scanner(System.in, "utf-8").nextLine();

PrintStream out = new PrintStream(System.out, true, "UTF-8");
out.println(s1 + "|" + (int)s1.charAt(0));
out.println(s2 + "|" + (int)s2.charAt(0));
out.println((int)'å' + "|" + (int)'?');
Community
  • 1
  • 1
Ted Hopp
  • 232,168
  • 48
  • 399
  • 521
  • Does charAt work properly with multi-byte characters? All of his tests are two-byte UTF-8 characters. – zebediah49 Oct 22 '14 at 16:04
  • @zebediah49 - Once the characters are in memory, Java uses UTF-16; the only characters that `charAt()` doesn't handle well are surrogate pairs. – Ted Hopp Oct 22 '14 at 16:05
0

The Windows cmd.exe does not support UTF-8 encoding. You have to use WriteConsoleW and ReadConsoleW., or use the chcp command, like new Scanner(System.in, "Cp850").

Tetramputechture
  • 2,911
  • 2
  • 33
  • 48