5

I need to read input from the user, and I want to have support for non-latin letters, such as Å, Ä and Ö.

BufferedReader keyboard = new BufferedReader(new InputStreamReader(System.in));
PrintWriter out = new PrintWriter(new OutputStreamWriter(System.out, "UTF-8"), true);
out.println(keyboard.readLine());
out.println("Read with charset: " + Charset.defaultCharset().name());

When I run this code, and input a latin letter it works as expected (I enter something, press enter and it prints out what I entered). But if I try with å I get this:

å

�
Read with charset: UTF-8

I have to hit enter twice if the text ends with a non-latin letter, and then it doesn't display them right. I have tried this in Netbeans' console, and in Windows command prompt and neither gives expected results.


I could not find a solution with UTF-8, but went with ISO-8859-1 instead. It worked with my Netbeans console (which should definitely be UTF-8) and in CMD when I first ran chcp 28591, changed the font (it was necessary in my case) and ran my program.

  • http://stackoverflow.com/questions/4597749/read-write-txt-file-with-special-characters and http://stackoverflow.com/questions/9281629/read-special-characters-in-java-with-bufferedreader – crAlexander Mar 07 '15 at 16:17
  • It works for me. Your console must be set up not to display UTF-8 properly. – RealSkeptic Mar 07 '15 at 16:22
  • @RealSkeptic, I can print out non-latin characters, no problem (Sys.out.print("å")). This works fine in both Netbeans console and in CMD. But when I try to read the characters the problem occurs (as well as having to hit enter twice when a text ends with å ä or ö). – Dan Lindqvist Mar 07 '15 at 19:48
  • Try just reading the *bytes* from System.in and printing them. This could tell you what character set the console is set to. – RealSkeptic Mar 07 '15 at 19:53
  • @RealSkeptic: Z = 90, Å = 197. But I think that the underlying reader(?) of BufferedReader goes with the default charset (which would be UTF-8 for me, as seen above). – Dan Lindqvist Mar 07 '15 at 20:52
  • So it's only sending one byte for the Å? This comes directly from the console, so the console is set to ISO-8859-1 rather than UTF-8. When the reader tries to interpret this as UTF-8, it messes it up. Try `new InputStreamReader( System.in, Charsets.ISO_8859_1)` instead of what you have now, I'm pretty sure the character will be read appropriately (though I'm not sure how it will print out, but you can check it in a debugger). – RealSkeptic Mar 07 '15 at 22:04
  • @RealSkeptic that seems to be working as expected (don't have to hit enter twice when using a BuffReader, and it displays and reads correctly) in the Netbeans console, and Å = 197. In CMD it displays Å as ?, and Å = 143. But I don't have to hit enter twice (in CMD with the new encoding). – Dan Lindqvist Mar 08 '15 at 12:56
  • You seem to have different issues in CMD and in Netbeans. For Netbeans, please read [this answer](http://stackoverflow.com/a/7219322/4125191), both the font and the encoding part. For CMD, try entering `chcp 65001` and then running the (original) program. – RealSkeptic Mar 08 '15 at 13:14
  • @RealSkeptic I can print out Å etc in the netbeans console, and the encoding is UTF-8 (tested with `Charset.defaultCharset().name()`, and I looked). I ran `chcp 28591`, changed the font and now the program runs as expected. – Dan Lindqvist Mar 08 '15 at 14:45

2 Answers2

1

The code sample is not encoding properly in any way. It is reading in data from the console using the system default and then writing it out using UTF-8. Your system default may not be UTF-8 and to complicate things, your console may or may not be the same as your system default.

To do this correctly in the console, you would need to read in using your console encoding, and write out using your console encoding. If you are just testing this and need to write out to a file, for example, write it as UTF-8 and make sure you open it with a text-editor as UTF-8.

Necreaux
  • 9,451
  • 7
  • 26
  • 43
  • The netbeans console is UTF-8, unless there's a bug with Netbeans. I found a work-around though (check my updated question). – Dan Lindqvist Mar 11 '15 at 20:31
0

Have you tried:

BufferedReader keyboard = new BufferedReader(new InputStreamReader(System.in,"UTF-8"));

If this doesn't work try reading the raw byte stream then converting to new String(bytes,"UTF-8")

  • If you don't specify a charset then it will go with the default one (which is UTF-8 in my case, as you can see in my question). I tried your other suggestion with reading the raw bytes as well but with no success. I found a workaround though (used ISO-8859-1 instead). – Dan Lindqvist Mar 11 '15 at 20:33
  • Can't believe it uses ISO-8859-1 not UTF-8. I thought UTF-8 was pretty standard nowadays but maybe not. – TV Trailers Mar 11 '15 at 21:14