7

Tomcat does not encode correctly String literals that contain unicode characters. The problem occurs at a Linux server but not on my development machine (Windows). It affects ONLY String literals (not Strings read from DB or from file!!!).

  • I have set the URIEncoding="utf-8" at the Connector tag (server.xml).
  • I have used setCharacterEncoding().
  • I cheched the stack trace (no filters that might set encoding).
  • I have set the LANG environment variable
  • I cheched the HTTP Headers and they are correct (Content-Type=text/plain;charset=utf-8)
  • I checked the encoding at the browser and it is correct (UTF-8)

Nothing of the above works. Any ideas on what I might be missing?

public class Test extends HttpServlet {

@Override
protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {

    resp.setCharacterEncoding("utf-8");
    resp.setContentType("text/plain;");

    Writer w = resp.getWriter();
    w.write("Μαλακία Latin"); //Some unicode characters
    w.close();
}

The above shows this at the browser. Îλληνικά Latin

idrosid
  • 7,983
  • 5
  • 44
  • 41
  • 2
    Make sure that source java file has utf-8 encoding set. I use Notepad++ to check this. Open the file and check for "Encoding" menu. If encoding is not UTF, then cut the entire content of the source file, change encoding, paste contents from the clipboard and save source file. –  Mar 22 '12 at 12:55
  • If you view the server response in a hex editor, what is the actual byte sequence returned? What is the encoding of your source file? – Michael Mar 22 '12 at 12:56
  • *"The problem occurs at a Linux server but not on my development machine (Windows). "* How are you deploying to Linux? Are you transferring files one by one? If so, how? By FTP? If so, are you transferring in binary mode? – BalusC Mar 22 '12 at 13:00
  • I use ant to check-out from an SVN repository and compile. I tried "file -bi" at the checked-out version and I get "text/x-java charset=utf-8". I can also read the file without problems. So most probably it's not a problem with source-file encoding. – idrosid Mar 22 '12 at 13:13
  • I solved it. The problem was ant (my build tool). I added an attribute to the javac tag: ''. Thank you all for your responses. You pointed me at the correct direction. – idrosid Mar 22 '12 at 13:29

2 Answers2

4

You can force the encoding of files when javac reads them by passing in -encoding 'utf-8' or -encoding 'iso-8859-1' when compiling. Just make sure that it matches whatever encoding your .java files are actually encoded as.

http://docs.oracle.com/javase/6/docs/technotes/tools/windows/javac.html

-encoding encoding Set the source file encoding name, such as EUC-JP and UTF-8. If -encoding is not specified, the platform default converter is used.

benmmurphy
  • 2,503
  • 1
  • 20
  • 30
3

Try setting the file.encoding system property e.g. -Dfile.encoding=utf-8 on the Linux JVM command line

Bruno Grieder
  • 28,128
  • 8
  • 69
  • 101
  • You're very close. I needed to add this when running javac. What confused me is that I DID run javac without this option and it worked OK. However, when ant was calling javac probably it was setting another default encoding. – idrosid Mar 22 '12 at 13:32
  • This solved a problem for me where my .jsp was including a UTF-8 encoded HTML file-fragment. Adding this parameter got the file loaded correctly. – JBCP Nov 06 '12 at 03:43
  • More options are inside http://stackoverflow.com/questions/11342884/change-tomcats-charset-defaultcharset-in-windows – Paul Verest Aug 07 '14 at 09:15