2

When I run the following program:

public static void main(String args[]) throws Exception
{
    byte str[] = {(byte)0xEC, (byte)0x96, (byte)0xB4};
    String s = new String(str, "UTF-8");
}

on Linux and inspect the value of s in jdb, I correctly get:

 s = "ì–´"

on Windows, I incorrectly get:

s = "?"

My byte sequence is a valid UTF-8 character in Korean, why would it be producing two very different results?

kujawk
  • 837
  • 1
  • 9
  • 11

4 Answers4

3

It correctly prints "" on my computer (Ubuntu Linux), as described in Code Table Korean Hangul. Windows command prompt is known to have issues with encoding, don't bother.

Your code is fine.

Tomasz Nurkiewicz
  • 334,321
  • 69
  • 703
  • 674
  • My mistake. The Korean characters were properly displaying in my Emacs text buffer so I naturally assumed that they would display properly in the Emacs shell buffer. Which as folks pointed out, they do not. – kujawk Oct 02 '12 at 21:34
1

It gives for me. This means your console is probably not configured to display UTF-8 and it is a printing/display problem, rather than a problem with representation.

Bozho
  • 588,226
  • 146
  • 1,060
  • 1,140
1

You get the correct string, it's Windows console that does not display the string correctly.

Here is a link to an article that discusses a way to make Java console produce correct Unicode output using JNI.

Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
0

JDB is displaying the data incorrectly. The code works the same on both Windows and Linux. Try running this more definitive test:

public static void main(String[] args) throws Exception {
    byte str[] = {(byte)0xEC, (byte)0x96, (byte)0xB4};
    String s = new String(str, "UTF-8"); 
    for(int i=0; i<s.length(); i++) {
        System.out.println(BigInteger.valueOf((int)s.charAt(i)).toString(16));
    }
}

This prints out the hex value of every character in the string. This will correctly print out "c5b4" in both Windows and Linux.

Dan Bliss
  • 1,694
  • 13
  • 10