8

I am struggling to get Eclipse to read in Chinese characters correctly, and I am not sure where I may be going wrong.

Specifically, somewhere between reading in a string of Chinese (simplified or traditional) from the console and outputting it, it gets garbled. Even when outputting a large string of mixed text (English/Chinese characters), it appears to only alter the appearance of the Chinese characters.

I have cut it down to the following test example and explicitly annotated it with what I believe is happening at each stage - note that I am a student and would very much like to confirm my understanding (or otherwise) :)

public static void main(String[] args) {    
    try 
    {
        boolean isRunning = true;

        //Raw flow of input data from the console
        InputStream inputStream = System.in;
        //Allows you to read the stream, using either the default character encoding, else the specified encoding;
        InputStreamReader inputStreamReader = new InputStreamReader(inputStream, "UTF-8");
        //Adds functionality for converting the stream being read in, into Strings(?)
        BufferedReader input_BufferedReader = new BufferedReader(inputStreamReader);


        //Raw flow of outputdata to the console
        OutputStream outputStream = System.out;
        //Write a stream, from a given bit of text
        OutputStreamWriter outputStreamWriter = new OutputStreamWriter(outputStream, "UTF-8");
        //Adds functionality to the base ability to write to a stream
        BufferedWriter output_BufferedWriter = new BufferedWriter(outputStreamWriter);



        while(isRunning) {
            System.out.println();//force extra newline
            System.out.print("> ");

            //To read in a line of text (as a String):
            String userInput_asString = input_BufferedReader.readLine();

            //To output a line of text:
            String outputToUser_fromString_englishFromCode = "foo"; //outputs correctly
            output_BufferedWriter.write(outputToUser_fromString_englishFromCode);
            output_BufferedWriter.flush();

            System.out.println();//force extra newline

            String outputToUser_fromString_ChineseFromCode = "之謂甚"; //outputs correctly
            output_BufferedWriter.write(outputToUser_fromString_ChineseFromCode);
            output_BufferedWriter.flush();

            System.out.println();//force extra newline

            String outputToUser_fromString_userSupplied = userInput_asString; //outputs correctly when given English text, garbled when given Chinese text
            output_BufferedWriter.write(outputToUser_fromString_userSupplied);
            output_BufferedWriter.flush();

            System.out.println();//force extra newline

        }
    }
    catch (Exception e) {
        // TODO: handle exception
    }
}

Sample output:

> 之謂甚
foo
之謂甚
之謂甚

> oaea
foo
之謂甚
oaea

> mixed input - English: fubar; Chinese: 之謂甚;
foo
之謂甚
mixed input - English: fubar; Chinese: 之謂甚;

> 

What is seen on this Stack Overflow post matches exactly what I see in the Eclipse console and what is seen within the Eclipse debugger (when viewing/editing the variable values). Altering the variable values manually via the Eclipse debugger results in the code depending on that value to behave as I would normally expect them to, suggesting that it is how the text is read IN that is an issue.

I have tried many different combinations of scanners/buffered stream [reader|writer]s etc to read in and output, with and without explicit character types though this wasn't done particularly systematically and could easily have missed something.

I have tried to set the Eclipse environment to use UTF-8 wherever possible, but I guess I could have missed a place or two.. Note that the console will correctly output hard-coded Chinese characters.

Any assistance / guidance on this matter is greatly appreciated :)

kwah
  • 1,149
  • 1
  • 13
  • 27
  • System.out is a [`PrintStream`](http://docs.oracle.com/javase/6/docs/api/java/io/PrintStream.html), which works byte by byte. You need to wrap it in a [`PrintWriter`](http://docs.oracle.com/javase/6/docs/api/java/io/PrintWriter.html) or an [`OutputStreamWriter`](http://docs.oracle.com/javase/6/docs/api/java/io/OutputStreamWriter.html) to output it as characters, which is why userInput is output incorrectly. – Powerlord Dec 14 '12 at 19:02
  • I fear I may be being rather naive here, I am about to edit the question - please assist me in understanding where you believe an using an output writer to output a value (at this point, stored as a String) will be of help. – kwah Dec 14 '12 at 20:50
  • Any more thoughts to add to this? Perhaps I should ask over at Eclipse to see if it is an IDE issue..? – kwah Dec 23 '12 at 02:21

3 Answers3

2

It looks like the console is not reading the input correctly. Here is a link that I believe describes your problem and work-rounds.

http://paranoid-engineering.blogspot.com/2008/05/getting-unicode-output-in-eclipse.html

Simple Answer : Try setting the environmental variable -Dfile.encoding=UTF-8 in your eclipse.ini. (Before enabling this for whole of eclipse, you could just try setting this in the debug configurtion for this program and see if it works )

The link has lot more suggestions

Zenil
  • 1,491
  • 3
  • 12
  • 21
  • @kwah did you try this suggestion ? – Zenil Jan 28 '13 at 05:43
  • 1
    I can confirm that initial tests of adding `-Dfile.encoding=UTF-8` to `eclipse.ini` do appear to work! :) I shall mark it as the correct answer in a day or two when I have had a chance to test this more thoroughly than just initial tests. – kwah Jan 29 '13 at 15:05
  • Do you have any idea why Eclipse needs an environment variable for it to recognise non-Unicode input? – kwah Jan 29 '13 at 15:07
  • Its related to the default encoding of the platform used by the console. You use file.encoding property setting to override the default encoding. This eclipse bug from the link above seems related https://bugs.eclipse.org/bugs/show_bug.cgi?id=13865 – Zenil Jan 30 '13 at 13:36
  • Another link related to file.encoding . http://stackoverflow.com/questions/361975/setting-the-default-java-character-encoding – Zenil Jan 30 '13 at 13:40
  • @kwah Also another work-around. If you want all your JVM's to be configured with this setting, you can define JAVA_TOOL_OPTIONS environment variable. This way no need to modify eclipse.ini file. http://stackoverflow.com/questions/361975/setting-the-default-java-character-encoding/623036#623036 – Zenil Jan 30 '13 at 13:43
  • -Dfile.encoding=UTF-8 works for me in eclipse but not work in build artifact on server. – Yogesh Bombe Feb 26 '19 at 10:23
1

Try this: In eclipse, right click your main class and click run as > run configurations. Then go to the common tab and change the encoding to UTF-8. That should work!

0

This seems to be an encoding problem. There might be two problems here: 1. You haven't activated the compilers ability to read anything but ASCII characters, in your case you need to be able to read UTF-8 characters. 2. You may have deleted certain language packs? This is unlikely since you probably are able to write Chinese characters?

You should search around and learn how you can your IDE to compile the non-ASCII characters correctly. In python this is done in the code itself, I'm unsure how it is done in Java.

Arash Saidi
  • 2,228
  • 20
  • 36