7

I'm getting an exception, buried way inside a 3rd party library, with a message like this:

java.io.UnsupportedEncodingException: BIG-5

I think this happening because Java doesn't define this name for java.nio.charset.Charset. Charset.forName("big5") is fine, but Charset.forName("big-5") throws the exception. (All these names appear to be case insensitive.)

This is different from "utf-8", which has some aliases to be more forgiving. For example, both Charset.forName("utf8") and Charset.forName("utf-8") work fine.

Question: is there a way to add the alias so that "big-5" maps to "big5"?

dnault
  • 8,340
  • 1
  • 34
  • 53
Rob N
  • 15,024
  • 17
  • 92
  • 165
  • Is the third-party library JavaMail by any chance? – dnault Nov 29 '16 at 22:07
  • Make a constant somewhere with `private static final Charset BIG5_CHARSET = Charset.forName("big5")`? You don't have a problem anymore. Or are you saying this is inside code you don't control? – Tunaki Nov 29 '16 at 22:07
  • Where do the charset names come from? Can you intercept and canonicalize them? – dnault Nov 29 '16 at 22:09
  • @dnault Yes, it's JavaMail. I may be able to intercept the data, but if I could define the alias somewhere globally that would be easier. – Rob N Nov 29 '16 at 22:10
  • @Tunaki Yes, I don't control the code. – Rob N Nov 29 '16 at 22:11
  • How about like this? http://stackoverflow.com/q/5960482/3788176 Also related: http://stackoverflow.com/q/6308587/3788176. – Andy Turner Nov 29 '16 at 22:20
  • I didn't realize how simple it is to register a custom CharsetProvider. That's probably the way to go. – dnault Nov 30 '16 at 02:21
  • 5
    See the [JavaMail FAQ entry](http://www.oracle.com/technetwork/java/javamail/faq/index.html#unsupen) that addresses this question, including sample code. (Hoisted from a comment by [Bill Shannon](http://stackoverflow.com/users/1040885/bill-shannon) on a now-deleted answer.) – Stuart Marks Nov 30 '16 at 06:30

1 Answers1

3

You can try the mail.mime.contenttypehandler system property:

In some cases JavaMail is unable to process messages with an invalid Content-Type header. The header may have incorrect syntax or other problems. This property specifies the name of a class that will be used to clean up the Content-Type header value before JavaMail uses it. The class must have a method with this signature: public static String cleanContentType(MimePart mp, String contentType) Whenever JavaMail accesses the Content-Type header of a message, it will pass the value to this method and use the returned value instead.

An example of this is:

import java.util.Arrays;
import javax.mail.Session;
import javax.mail.internet.ContentType;
import javax.mail.internet.MimeMessage;
import javax.mail.internet.MimePart;

public class FixEncodingName {

    public static void main(String[] args) throws Exception {
        MimeMessage msg = new MimeMessage((Session) null);
        msg.setText("test", "big-5");
        msg.saveChanges();
        System.out.println(msg.getContentType());
        System.out.println(Arrays.toString(msg.getHeader("Content-Type")));
    }

    public static String cleanContentType(MimePart p, String mimeType) {
        if (mimeType != null) {
            String newContentType = mimeType;
            try {
                ContentType ct = new ContentType(mimeType);
                String cs = ct.getParameter("charset");
                if ("big-5".equalsIgnoreCase(cs)) {
                    ct.setParameter("charset", "big5");
                    newContentType = ct.toString();
                }
            } catch (Exception ignore) {
                newContentType = newContentType.replace("big-5", "big5");
            }

            /*try { //Fix the header in the message.
                p.setContent(p.getContent(), newContentType);
                if (p instanceof Message) {
                    ((Message) p).saveChanges();
                }
            } catch (Exception ignore) {
            }*/
            return newContentType;
        }
        return mimeType;
    }
}

When run with -Dmail.mime.contenttypehandler=FixEncodingName will output:

text/plain; charset=big5
[text/plain; charset=big-5]
jmehrens
  • 10,580
  • 1
  • 38
  • 47