4

I have a simple form where I can type some characters. These characters are sent to a servlet which does a getBytes and print the bytes. The correct UTF-8 bytes for a "ã" are -61 and -93, but I get -52 and -93. :(

I tried everything to understand and fix this, but nothing worked. Everything on my machine should be UTF-8 so I suspect it has to do with the US International keyboard I have been using for 20 years.

Does any smart soul have a clue from where -52 and -93 are coming from?

FIXED on Jetty: See my answer below.

BROKEN on Tomcat: How to get tomcat to understand MacRoman (x-mac-roman) charset from my Mac keyboard?

Community
  • 1
  • 1
chrisapotek
  • 6,007
  • 14
  • 51
  • 85
  • 1
    Calling `getBytes()` on the string isn't a good way of determining what was *actually* sent. Use Wireshark or something similar. – Jon Skeet Apr 28 '12 at 21:30

2 Answers2

9

That is the Mac OS Roman character encoding. (0xBB == -52.)

Some things to check:

  • getBytes(string, "UTF-8") and new String(bytes, "UTF-8").
  • The form should have been sent in UTF-8: response.setContentType("text/html; charset="UTF-8");. In a JSP <%@page pageEncoding="UTF-8"%>
  • <form action="..." accept-charset="UTF-8">

As all that did not help:

Set the request filtering in your web application (web-xml).


Encoding in pom.xml:

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-compiler-plugin</artifactId>
    <version>...</version>
    <configuration>
        <source>1.6</source>
        <target>1.6</target>
        <encoding>${project.build.sourceEncoding}</encoding>
    </configuration>
</plugin>
<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-resources-plugin</artifactId>
    <version>...</version>
    <configuration>
        <encoding>${project.build.sourceEncoding}</encoding>
    </configuration>
</plugin>
...
<properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
Joop Eggen
  • 107,315
  • 7
  • 83
  • 138
  • Thanks Joop, but it did not work. My guess is no matter what I do, the web container get Mac OS Roman and don't know what to do with it. My browser encoding is set to UTF-8. :( – chrisapotek Apr 28 '12 at 21:49
  • As "page information" you'll probably see the actual encoding. The accept-charset can be seen in the "page source." The latter might have been forgotten. Or you might have a very old browser. Maybe a `` might be a last resort. – Joop Eggen Apr 28 '12 at 21:53
  • The interesting is that for one char, you get two bytes, so UTF-8 is done (though wrong). But somewhere you get Mac Roman, and ISO-8859-1 to UTF-8 conversion is done. Did you try it with a different browser? Did you trace `request.getEncoding()`? – Joop Eggen Apr 28 '12 at 21:57
  • I did everything you suggested. I am doing a System.out.println of the string I am getting in the servlet. It does not print right. Tested on Jetty and Tomcat. :( – chrisapotek Apr 28 '12 at 21:58
  • System.out.println(req.getCharacterEncoding()); => null – chrisapotek Apr 28 '12 at 22:05
  • As this seems something with response/request filtering, I extended answer with servlet filtering. In answer to your comment: request.setEncoding could be done, but try filtering first. – Joop Eggen Apr 28 '12 at 22:05
  • BTW search in stackoverflow for a better answer than mine. It can't be the first time. – Joop Eggen Apr 28 '12 at 22:10
  • 1
    http://stackoverflow.com/questions/10369014/how-to-get-tomcat-to-understand-macroman-x-mac-roman-charset-from-my-mac-keybo – chrisapotek Apr 29 '12 at 01:37
3

Ok, after a good 8 hours (serious!) it looks like the only way to get this working correctly is to do:

One of the problems was: bad maven build encoding compilation of class files.

export JAVA_TOOL_OPTIONS=-Dfile.encoding=UTF-8
mvn clean install

AND:

   <%@page pageEncoding="UTF-8" %>

NOW:

There is no way knowable to pass the latter option in your pom.xml.

Here is a pending answer for that: enabling UTF-8 encoding for clojure source files

Community
  • 1
  • 1
chrisapotek
  • 6,007
  • 14
  • 51
  • 85