I've noticed in Node.js, when reading stdin from the Windows Console (conhost.exe), when you input a UTF-8 character it reads it just fine, with any code page.
I've been testing with an emoji (), but you can try it with whatever you want.
(both these programs were run in cmd.exe) Example code:
process.stdin.on("readable", () => {
var input = process.stdin.read();
if (input !== null) {
console.log(input); // this will output the correct UTF-8 bytes, <Buffer f0 9f 98 8a 0d 0a>
process.exit();
}
});
Now testing with Java
import java.io.*;
import java.nio.charset.StandardCharsets;
public class Main {
public static void main(String[] args) throws IOException {
BufferedReader r = new BufferedReader(new InputStreamReader(System.in, StandardCharsets.UTF_8));
String s = r.readLine();
for (int i = 0; i < s.length(); i++) {
char c = s.charAt(i);
System.out.println((int) c);
}
}
}
With the default code page (437), it outputs (63, 63), while with 65001 (UTF-8), it outputs two null bytes (0, 0), which is even stranger.
I thought the Windows console (conhost) didn't support Unicode, but Node can at least read the bytes intact (albeit not being able to display them as text). How can it do that, and is there a way I can get this behaviour in Java?