2

I saved my Java source file specifying it's encoding type as UTF-8 in my eclipse. It is working fine in eclipse. When I create a build with maven & execute it in my system Unicode characters are not working.

This is my code :

    byte[] bytes = new byte[dataLength];
    buffer.readBytes(bytes);
    String s = new String(bytes, Charset.forName("UTF-8"));
    System.out.println(s);

enter image description here

Eclipse console & windows console screenshot attached. Expecting eclipse output in other systems(windows command prompt, powershell window, Linux machine, etc.,).

Prasath
  • 1,233
  • 2
  • 16
  • 38
  • What is the value of system property `file.encoding` when running in the console? How do you read the data, how do you print? Show some code. – Mark Rotteveel Sep 22 '17 at 09:55
  • Probably your PowerShell encoding is not UTF-8. Try to set its encoding as UTF-8: run command `[Console]::OutputEncoding = [Text.UTF8Encoding]::UTF8` and then run your java program. – Mykhailo Hodovaniuk Sep 22 '17 at 09:58
  • It is the maven-compiler-plugin that has to know the encoding to compile with too. This is a pom setting. Errors in the console cannot be trusted to be real errors, as there typically might be another platform encoding set. – Joop Eggen Sep 22 '17 at 09:59
  • @MarkRotteveel getting data from server and printing it in console. I have updated question with my sample code. – Prasath Sep 22 '17 at 10:00
  • @MonteCristo I tried your command in powershell window & still getting the same issue. – Prasath Sep 22 '17 at 10:03
  • 2
    @Prasath all you've done in the Eclipse settings is set the *source encoding* to UTF-8. That will make no difference whatsoever to your program, unless you have non-ASCII characters in your source code, e.g. if you have a £ sign in a variable name. You haven't changed the system default encoding. – Klitos Kyriacou Sep 22 '17 at 10:07
  • What is your windows powershell encoding, can it display the characters? – matt Sep 22 '17 at 10:37
  • Check out this link https://stackoverflow.com/questions/9180981/how-to-support-utf-8-encoding-in-eclipse – User27854 Sep 22 '17 at 10:40
  • Possible duplicate of [How to support UTF-8 encoding in Eclipse](https://stackoverflow.com/questions/9180981/how-to-support-utf-8-encoding-in-eclipse) – User27854 Sep 22 '17 at 10:41
  • @User27854 I have already gone through that post. My question is different. I am not asking to run in eclipse. I expecting this to be run command prompt & Linux server. Hope you understood my question. – Prasath Sep 22 '17 at 10:55
  • @User27854 BTW It is working fine in my eclipse. Issue may be in build or running environment such as windows command prompt or Linux machine. – Prasath Sep 22 '17 at 11:00
  • Did you try writing the characters to the file. I really think your issue is just the terminal display. – matt Sep 22 '17 at 13:02
  • Also you should paste the string, and if you have trouble with that maybe paste a unicode escaped version. eg "\u0041\u0042". – matt Sep 22 '17 at 13:10
  • @matt Yes I tried. I wrote it in log file. Same issue occurs in windows. It is working in Linux. – Prasath Sep 25 '17 at 06:44
  • @matt Used 'java -Dfile.encoding=UTF-8 -jar {jarName}.jar' command to run. It works fine in windows too. – Prasath Sep 25 '17 at 14:07
  • Duplicate - [printing-unicode-characters-to-the-powershell-prompt](https://stackoverflow.com/questions/5796339/printing-unicode-characters-to-the-powershell-prompt) –  Sep 25 '17 at 14:49
  • @JarrodRoberson Will that change the output the program produces? In OPs example and their solution was because `System.out` was defaulting to the wrong character set, so changing file.encoding fixed it. – matt Sep 25 '17 at 14:55

2 Answers2

0

You could use the Console class for that.The following code could give you some inspiration:

public class Foo {

    public static void main(String[] args) throws IOException {
        String s = "öäü";
        write(s);
    }

    private static void write(String s) throws IOException {
        String encoding = new OutputStreamWriter(System.out).getEncoding();
        Console console = System.console();
        if (console != null) {
            // if there is a console attached to the jvm, use it.
            System.out.println("Using encoding " + encoding + " (Console)");
            try (PrintWriter writer = console.writer()) {
                writer.write(s);
                writer.flush();
            }
        } else {
            // fall back to "normal" system out
            System.out.println("Using encoding " + encoding + " (System out)");
            System.out.print(s);
        }
    }
}

Tested on Windows 10(poowershell), Ubuntu 16.04(bash) with default settings. Also works from within IntelliJ (Windows and Linux).

Ortwin Angermeier
  • 5,957
  • 2
  • 34
  • 34
0

From what I can tell, you either have the wrong character, which I don't think is the case, or you are trying to display it on a terminal that doesn't handle the character. I have written a short test to separate the issues.

public static void main(String[] args){
    String testA = "ֆޘᜅᾮ";
    String testB = "\u0586\u0798\u1705\u1FAE";

    System.out.println(testA.equals(testB));
    System.out.println(testA);
    System.out.println(testB);

    try(BufferedWriter check = Files.newBufferedWriter(
            Paths.get("uni-test.txt"),
            StandardCharsets.UTF_8,
            StandardOpenOption.CREATE,
            StandardOpenOption.TRUNCATE_EXISTING) ){
        check.write(testA);
        check.write("\n");
        check.write(testB);
        check.close();
    } catch(IOException ioc){

    }

}

You could replace the values with the characters you want.

The first line should print out true if the string is the actual string you want. After that it is a matter of displaying the characters. For example if I open the text file with less then half of them are broken. If I open it with firefox, then I see all four characters, but some are wonky. You'll need a font that has characters for the corresponding unicode value.

One thing you can do is open the file in a word processor and select a font that displays the characters you want correctly.

As suggested by the OP, including the -Dfile.encoding=UTF8causes the characters to display correctly when using System.out.println. Similar to this question which changes the encoding of System.out.

matt
  • 10,892
  • 3
  • 22
  • 34