Convert UTF-8 to Shift-JIS

Question

I have written the simple conversion code to convert to Japanese character from UTF-8.

    private static String convertUTF8ToShiftJ(String uft8Strg) {
        String shftJStrg = null;
        try {

            byte[] b = uft8Strg.getBytes(UTF_8);
            shftJStrg = new String(b, Charset.forName("SHIFT-JIS"));
            logger.info("Converted to the string :" + shftJStrg);
        } catch (Exception e) {
            e.printStackTrace();
            return uft8Strg;
        }
        return shftJStrg;
    }

But it gives the output error,

convertUTF8ToShiftJ START !!
uft8Strg=*** abc000.sh é��å§�ã��ï¼�startedï¼�
*** abc000.sh å®�è¡�ä¸ï¼�executing...ï¼�
*** abc000.sh çµ�äº�ã��ï¼�endedã��ï¼�*

Do anybody have any idea that where I made a mistake or need some additional logic, it would be really helpful!

Check this one also. It is reverse of what you wanted i.e. Shift JIS to UTF-8 https://stackoverflow.com/questions/39992097/convert-shift-jis-format-to-utf-8-format — Vikasdeep Singh, Aug 15 '18 at 07:02
You should never need to do this. When you have a string with wrong characters in it the problem happened when this string was first created or read. The fix must done at this place. — Henry, Aug 15 '18 at 08:06

Rob Audenaerde · Answer 1 · 2018-08-15T08:06:01.980

You String is already a String, so your method is "wrong". UTF8 is an encoding that is a byte[] and can be converted to a String in Java.

It should read:

private static byte[] convertUTF8ToShiftJ(byte[] uft8) {

If you want to convert UTF8 byte[] to JIS byte[]:

private static byte[] convertUTF8ToShiftJ(byte[] uft8) {
    String s = new String(utf8, StandardCharsets.UTF_8);
    return s.getBytes( Charset.forName("SHIFT-JIS"));
}

A String can be converted to a byte[] later, by mystring.getBytes(encoding)

Please see The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) for more detail.

thanks for your answer brother. Still having the problem so trying to fix the issue. — CabNt, Aug 16 '18 at 05:20

Thomas Fritsch · Answer 2 · 2018-08-15T08:10:33.710

2

It seems you have a conceptual misunderstanding about String encodings. See for example Byte Encodings and Strings.

Converting a String from one encoding to another encoding doesn't make sense, because String is a thing independent of encoding.

However, a String can be represented by byte arrays in various encodings (like for example UTF-8 or Shift-JIS). Therefore, it would make sense to convert a UTF-8 encoded byte array to a Shift-JIS encoded byte array.

private static byte[] convertUTF8ToShiftJ(byte[] utf8Bytes) throws IllegalCharsetNameException  {
    String s = new String(utf8Bytes, StandardCharsets.UTF_8);
    byte[] shftJBytes = s.getBytes(Charset.forName("SHIFT-JIS"));
    return shftJBytes;
}

edited Aug 15 '18 at 08:10

answered Aug 15 '18 at 07:58

Thomas Fritsch

9,639
33
37
49

Great answer :) – Rob Audenaerde Aug 15 '18 at 08:06
Actually, to be precise, `String`s also have an internal encoding (in memory). Nowadays that can be `LATIN1` and `UTF-16`. https://www.baeldung.com/java-9-compact-string. But then handing IO, best to always explicitly encode to bytes using a standard charset. – Rob Audenaerde Aug 15 '18 at 08:38
thank you so much for pointing out my conceptual misunderstanding. But still having the same issue. Now, trying to do some additional changes. – CabNt Aug 16 '18 at 05:22

Convert UTF-8 to Shift-JIS

2 Answers2