0

I have written the simple conversion code to convert to Japanese character from UTF-8.

    private static String convertUTF8ToShiftJ(String uft8Strg) {
        String shftJStrg = null;
        try {

            byte[] b = uft8Strg.getBytes(UTF_8);
            shftJStrg = new String(b, Charset.forName("SHIFT-JIS"));
            logger.info("Converted to the string :" + shftJStrg);
        } catch (Exception e) {
            e.printStackTrace();
            return uft8Strg;
        }
        return shftJStrg;
    }

But it gives the output error,

convertUTF8ToShiftJ START !!
uft8Strg=*** abc000.sh ����started�
*** abc000.sh ��中�executing...�
*** abc000.sh ����ended��*

Do anybody have any idea that where I made a mistake or need some additional logic, it would be really helpful!

Thomas Fritsch
  • 9,639
  • 33
  • 37
  • 49
CabNt
  • 31
  • 1
  • 1
  • 9
  • this may help https://stackoverflow.com/a/42136502/8349475 – E141 Aug 15 '18 at 06:59
  • Check this one also. It is reverse of what you wanted i.e. Shift JIS to UTF-8 https://stackoverflow.com/questions/39992097/convert-shift-jis-format-to-utf-8-format – Vikasdeep Singh Aug 15 '18 at 07:02
  • You should never need to do this. When you have a string with wrong characters in it the problem happened when this string was first created or read. The fix must done at this place. – Henry Aug 15 '18 at 08:06

2 Answers2

2

You String is already a String, so your method is "wrong". UTF8 is an encoding that is a byte[] and can be converted to a String in Java.

It should read:

private static byte[] convertUTF8ToShiftJ(byte[] uft8) {

If you want to convert UTF8 byte[] to JIS byte[]:

private static byte[] convertUTF8ToShiftJ(byte[] uft8) {
    String s = new String(utf8, StandardCharsets.UTF_8);
    return s.getBytes( Charset.forName("SHIFT-JIS"));
}

A String can be converted to a byte[] later, by mystring.getBytes(encoding)

Please see The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) for more detail.

Rob Audenaerde
  • 19,195
  • 10
  • 76
  • 121
2

It seems you have a conceptual misunderstanding about String encodings. See for example Byte Encodings and Strings.

Converting a String from one encoding to another encoding doesn't make sense, because String is a thing independent of encoding.

However, a String can be represented by byte arrays in various encodings (like for example UTF-8 or Shift-JIS). Therefore, it would make sense to convert a UTF-8 encoded byte array to a Shift-JIS encoded byte array.

private static byte[] convertUTF8ToShiftJ(byte[] utf8Bytes) throws IllegalCharsetNameException  {
    String s = new String(utf8Bytes, StandardCharsets.UTF_8);
    byte[] shftJBytes = s.getBytes(Charset.forName("SHIFT-JIS"));
    return shftJBytes;
}
Thomas Fritsch
  • 9,639
  • 33
  • 37
  • 49
  • Great answer :) – Rob Audenaerde Aug 15 '18 at 08:06
  • Actually, to be precise, `String`s also have an internal encoding (in memory). Nowadays that can be `LATIN1` and `UTF-16`. https://www.baeldung.com/java-9-compact-string. But then handing IO, best to always explicitly encode to bytes using a standard charset. – Rob Audenaerde Aug 15 '18 at 08:38
  • thank you so much for pointing out my conceptual misunderstanding. But still having the same issue. Now, trying to do some additional changes. – CabNt Aug 16 '18 at 05:22