3

I am using a local validator.nu instance to validate a site, however it keeps telling me the encoding does not match:

Internal encoding declaration “iso-8859-1” disagrees with the actual encoding of the document (“utf-8”).

I've done everything to try and get the encoding to be forced to iso-8859-1 as we are using a legacy DB that requires this encoding.

  1. Process that starts forces LANG='iso-8859-1'
  2. Forcing file.encoding on tomcat startup -Dfile.encoding=iso-8859-1, this is confirmed by checking Charset.defaultCharset() which reports ISO-8859-1.
  3. Maven project resources are copied with iso-8859-1: <project.build.sourceEncoding>iso-8859-1</project.build.sourceEncoding>
  4. JSP page directive specifies encoding: <%@page contentType="text/html; charset=ISO-8859-1" pageEncoding="ISO-8859-1" %>
  5. Content-Type has been set in page head: <meta http-equiv="content-type" content="text/html; charset=iso-8859-1">
  6. Tomcat URIEncoding set: <Connector port="80" protocol="HTTP/1.1" connectionTimeout="20000" URIEncoding="iso-8859-1" redirectPort="8443" />

What else could I have missed that's causing the page to come back as utf-8?

Interestingly it is rendering characters like © correctly, and if © is placed in a text input it is saved to the DB correctly using the 8859-1 codepage.

UPDATE: I've just decided to download a page from the server with cURL and upload to the w3 checker which validated successfully. The only issue it had was the naming of iso-8859-1 should be windows-1252 though I thought those two character sets were slightly different, this w3 mailing-list entry says otherwise though, I need to look into that.

This is looking more and more like a bug in validator.nu which I will also look into.

Brett Ryan
  • 26,937
  • 30
  • 128
  • 163
  • Have you checked the HTTP `Content-Type` header that the server returns? – artbristol Mar 25 '13 at 08:49
  • @artbristol, firebug reports the response header as: `Content-Type text/html;charset=ISO-8859-1` – Brett Ryan Mar 25 '13 at 08:55
  • As suggested in the validator mailing list, please specify the URL of the page, letting people access both the actual HTTP headers and the actual data sent. – Jukka K. Korpela Mar 26 '13 at 20:46
  • @JukkaK.Korpela, unfortunately I do not have access to do this, the application is an internal application and I do not have access to a public tomcat instance. – Brett Ryan Mar 26 '13 at 22:14
  • @JukkaK.Korpela, I've managed to borrow an endpoint to host a test. As a result though I can't have this up for long: [rest.johnsands.com.au](https://rest.johnsands.com.au/). – Brett Ryan Mar 26 '13 at 23:03
  • I've now used the (W3C validation service)[http://validator.w3.org/check] to check the URL which comes back fine, apart from the warning: "Legacy encoding windows-1252 used. Documents should use UTF-8.", which unfortunately I can not use. – Brett Ryan Mar 26 '13 at 23:08

2 Answers2

1

I've found the problem!

The document is fine, the server is fine, the validator - actually validates fine. It's the firefox plugin that's changing the page encoding before sending to the validator and giving me a false error.

I have come to this conclusion from help on the help@lists.whatwg.org mailing list and changing from the Fx html5validator addon to the Fx web developer addon which now validates my documents correctly. Using the local validator instance now validates fine.

I've this issue with the original firefox plugin.

Brett Ryan
  • 26,937
  • 30
  • 128
  • 163
  • A rather annoying side-affect of using the validator plugin is it just passes the URL instead of the page contents. If request parameters were used to generate the page contents and the url has been rewritten it's not possible to get the same page back with the validator, so it's not validating what you're looking at. – Brett Ryan Mar 27 '13 at 04:23
0

Try adding a filter (instance of javax.servlet.Filter declared with <filter> and <filter-mapping> tags in web.xml) that will set the desired character encoding on ServletRequest and ServletResponse instances coming into the doFilter() method as parameters.

See javadoc here and here.

maksim_khokhlov
  • 794
  • 5
  • 8
  • Yes, I have tried adding the spring implementation of [CharacterEncodingFilter](http://static.springsource.org/spring/docs/3.2.1.RELEASE/javadoc-api/org/springframework/web/filter/CharacterEncodingFilter.html) with no success. – Brett Ryan Mar 25 '13 at 12:25
  • @BrettRyan Asking just to make sure: did you enforce filter's character encoding (init-param forceEncoding = true)? Also, if you're using Spring MVC, this SO case can be helpful: http://stackoverflow.com/questions/3616359/who-sets-response-content-type-in-spring-mvc-responsebody – maksim_khokhlov Mar 25 '13 at 12:50
  • I'm not using spring, just felt that I'd try a proven filter, yes I did try enforcing the encoding. I'm actually starting to think that it's the validator telling me the wrong thing, I can't see any evidence that the returned page is encoded in utf-8. – Brett Ryan Mar 25 '13 at 13:01