1

if i print unicode String like ελληνικά on the console using the print method of System.out stream, its printed as expected (As i use Ubuntu mono in my output console which supports UTF characters).

But if i try to read from the console unicode characters with UTF-8 encoding using System.in stream, it doesn't read properly. I have tried many different ways to achieve it using various reader classes with the System.in stream but it never works. So does anyone know a way i could do that

Here is a sample of code

BufferedReader keyboard = new BufferedReader(new InputStreamReader(System.in, "UTF-8"));
BufferedWriter console = new BufferedWriter(new OutputStreamWriter(System.out, "UTF-8"));

console.write("p1: Γίνεται πάντως\n");
console.flush();
System.out.println("p2: Γίνεται πάντως");

byte dataBytes[] = keyboard.readLine().getBytes(Charset.forName("UTF-8"));
System.out.println("p3: " + new String(dataBytes));
console.write("p4: " + new String(dataBytes, "UTF-8") + "\n");
console.flush();
Scanner scan = new Scanner(System.in, "UTF-8");

System.out.println("p5: " + (char) System.in.read());
System.out.println("p6: " + scan.nextLine());
System.out.println("p7: " + keyboard.readLine());

and the output on my console:

p1: Γίνεται πάντως
p2: Γίνεται πάντως
Δέν
p3: ���
p4: ���
Δέν
p5: Ä
p6: ��
Δέν
p7: ���

my IDE is Netbeans

cavla
  • 118
  • 3
  • 11

2 Answers2

1

System.in is an InputStream, which is a stream of bytes. You need a Reader to read characters. The reader is going to do the decoding for you.

In this case, you can wrap System.in with a InputStreamReader, passing "UTF-8" as the second constructor parameter.

Scanner console = new Scanner(new InputStreamReader(System.in, "UTF-8"));
while (console.hasNextLine())
    System.out.println(console.nextLine());

Update:

It's likely the encoding of your stdin is wrong. To verify, you can compare the byte array you get from System.in and the expected.

byte [] expected = "Δέν".getBytes("UTF-8"); // [-50, -108, -50, -83, -50, -67]

byte [] fromStdin = new byte[1024];
int c = System.in.read(fromStdin);
for (int i = 0; i < c-1; i++) {
    if (expected[i] != fromStdin[i]) {
        System.out.println(i + ", " + fromStdin[i]);
    }
}

And you input "Δέν" (without double quotes) then hit enter. If it outputs anything, your System.in is in wrong encoding.

Shouldn't System.in have the same encoding as defaultCharset or some system property?

Not necessarily. It's a byte stream, not a character stream. It cannot be a character stream, because you can/should be able to feed it binary data. An image or audio or vedio, whatever you want. It must support those. That's why it's just an InputStream. It depends on what the environment gave your program. And I know very little about your environment. You need to find out how to change your environment, or figure out what encoding it's actually giving your program.

For example we have an UTF-16 text file utf16.txt, and we feed its content to our program who expects the STDIN to be UTF-8 encoded text:

java -cp ... our.utf8.Program < utf16.txt

It's going to read gibberish.

xiaofeng.li
  • 8,237
  • 2
  • 23
  • 30
  • yes you are right. But how can i set the encoding of System.in? – cavla Nov 23 '16 at 02:50
  • But why the java.nio.charset.Charset.defaultCharset() returns "UTF-8"? isn't that the encoding that System.in uses? – cavla Nov 23 '16 at 04:04
  • Updated answer again. – xiaofeng.li Nov 23 '16 at 04:29
  • so as i see i have to set windows enviroment encoding? how can i do that? – cavla Nov 23 '16 at 20:34
  • You may want to try [this](http://superuser.com/questions/269818/change-default-code-page-of-windows-console-to-utf-8) – xiaofeng.li Nov 23 '16 at 22:11
  • Thank you for your suggestion. Although i still can't read unicode with System.in. I followed what the answer instructed and now i have Code Page 65001 which is the utf-8 but i can't still input utf characters. I beleive its System.in problem as i can input unicode in batch – cavla Nov 25 '16 at 04:14
1

Try using java.io.Console.readLine() or java.io.Console.readLine(String, Object...). Console instance is returned by System.console() method. For example:

package package01;

import java.io.Console;

public class Example {

    public static void main(String[] args) {
        Console console = System.console();
        if (console == null) {
            System.err.println("No console");
            System.exit(1);
        }
        String s = console.readLine("Enter string: ");
        System.out.println(s);
    }

}
user1257
  • 622
  • 1
  • 8
  • 13
  • Using `Console` is probably the right way to go. However you should be aware that within IDEs [`System.console()` will return `null`](https://stackoverflow.com/q/4203646/4288506). It would probably be best to create a separate method for reading with a `Reader` parameter. Then you could call it with `System.console().reader()`, or try to create a reader based on `System.in` and even use a mock reader for unit tests. – Marcono1234 May 25 '19 at 17:05