Double check the following settings, making sure everyone knows it's UTF-8 party.
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Page Title</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
<meta name="format-detection" content="telephone=no" />
</head>
<body>
your html content goes here....
</body>
</html>
Database tables are using utf-8 charset, I don't trust db defaults that's why create table definitions have it.
CREATE DATABASE mydb DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_swedish_ci;
CREATE TABLE tMyTable (
id int(11) NOT NULL auto_increment,
code VARCHAR(20) NOT NULL,
name VARCHAR(20) NOT NULL,
PRIMARY KEY (id)
) ENGINE=InnoDB DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_swedish_ci;
Let JDBC connection know utf-8 charset.
<Resource name="jdbc/mydb" auth="Container" type="javax.sql.DataSource"
maxActive="10" maxIdle="2" maxWait="10000"
username="myuid" password="mypwd"
driverClassName="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost:3306/mydb?useUnicode=true&characterEncoding=utf8"
validationQuery="SELECT 1"
/>
Some Tomcat versions don't use the same charset origin for GET or POST form requests, so add useBodyEncodingForURI attribute to force GET form parser oboye setCharacterEncoding value.
<Connector port="8080"
maxThreads="150" minSpareThreads="25" maxSpareThreads="75"
enableLookups="false" redirectPort="8443" acceptCount="100"
debug="0" connectionTimeout="20000"
disableUploadTimeout="true" useBodyEncodingForURI="true"
/>
This call must happen before any filter or other code tries to read parameters from the request. So try to call it early as possible.
if (req.getCharacterEncoding() == null)
req.setCharacterEncoding("UTF-8");
Be careful with the whitespace characters in a .jsp page. I use this technique to set multiple tag headers, see how ending and starting tags are next to each other.
<%@ taglib prefix="c" uri="http://java.sun.com/jsp/jstl/core" %><%@
page contentType="text/html; charset=UTF-8" pageEncoding="ISO-8859-1"
import="java.util.*,
java.io.*"
%><%
request.setCharacterEncoding("UTF-8");
String myvalue = "hello all and ÅÄÖ";
String param = request.getParameter("fieldName");
myvalue += " " + param;
%><!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Page Title</title>
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta name="format-detection" content="telephone=no" />
</head>
<body>
your html content goes here.... <%= myvalue %>
</body>
JSP page contentType attribute is the one set in http response object and pageEncoding is the one being used in a disk file. They don't need to match and I usually use ISO-8859-1 if page is only using safe us-ascii characters. Don't use UTF8WithBOM format because hidden leading bom marker bytes may create problems in some J2EE servers.
Last thing is how you write strings to the response stream, if you convert strings to bytes make sure it's using utf-8 and let client know it.
response.setContentType("text/html; charset=UTF-8");
response.getOutputStream().write( myData.getBytes("UTF-8") );
This was a long post but it pretty much covers most corner issues.