0

I have a text file (.txt) containing words with foreign characters. For example, the first word is école

I read each line in using String lineData = inputBuffFile.readLine(); So, lineData becomes "école"

1) I can print the word to a Command Prompt window (as part of a user-input question) using String replOption = console.readLine(lineData) - the é displays properly.

2) I can replace foreign characters using replaceAll. That is, tmpWord = lineData.replaceAll("éco","!") results in tmpWord becoming "!le"

Based on these two tests, the foreign character is read and stored properly.

However, if I print the word to a Command Prompt window using System.out.println(lineData), the é becomes another character (a capital U with the same accent mark).

I have looked through the questions on stackoverflow.com to try to understand this and have seen the suggestions to print using unicode values (which would mean I would have to convert the characters to their unicode equivalents).

Is their another way to print these out or a switch I need to include? If this has already been asked, I would appreciate a pointer.

Thank you in advance, Mike

Mike
  • 23
  • 4
  • Are you using Windows/Mac OS X/Linux? – Martin Konecny Jun 03 '14 at 02:53
  • That would seem to indicate that the default character encoding for your OS does not include that code point. – Brett Okken Jun 03 '14 at 02:53
  • This is a nice explanation of the issue: http://www.java-tips.org/java-se-tips/java.io/output-french-characters-to-the-console.html – Nir Alfasi Jun 03 '14 at 02:54
  • Martin, I am using Windows 8 (but if I port the code over to a Mac, will I have to change the code?) – Mike Jun 03 '14 at 02:55
  • Brett, I don't really understand what you said, but I'll do some searching – Mike Jun 03 '14 at 02:56
  • I believe Mac OS and Windows 8 has different encoding that's why it's better to use unicode to set a standard for both different system. – Sky Jun 03 '14 at 02:57
  • @Mike after thinking and researching for awhile, I'm guessing the problem might lies on the command prompt encoding. As you are able to take in the correct coding and display the encoding properly. Try setting your command prompt charset to UTF-8 by doing this command "chcp 65001" upon executing your command prompt. I believe your issue is similar as [this](http://stackoverflow.com/questions/388490/unicode-characters-in-windows-command-line-how), and to make chcp perm, look at [this](http://stackoverflow.com/questions/14109024/how-to-make-unicode-charset-in-cmd-exe-by-default) – Sky Jun 03 '14 at 03:01

1 Answers1

0

Here is a link to the answer of the problem. Basically to use PrintStream to achieve so.

Anton
  • 559
  • 2
  • 15