We are using a JavaScript WYSIWYG text editor called CKEditor. The editor has a source view that marks up, with HTML, what the user has entered in the text editor. Sometimes the editor will insert non-breaking spaces (
) into this source view, which is fine.
Everything seemed to work correctly on the dev machines so we deployed to our production servers. At this point we started seeing a weird  character (Â
) being inserted into the text. After some reading I saw that this was reported in several tickets on the CKEditor bug tracking page. I was able to resolve the issue by setting the charset
attribute on the script
tag for ckeditor.js to UTF-8
.
My question is this: Why did the script
tag need the charset
attribute set in the first place, and why only on certain systems?
The last comment on this SO question mentions that the byte sequence for a non-breaking space in UTF-8 is actually the  character followed by a non-breaking space in latin1
(which is ISO-8859-1 right?). This could definitely be a clue because another  character is inserted, one after another, every time the user switches to source view. It is as if the CKEditor framework is trying to inject a non-breaking space, but that gets turned into  
, then  
, and so on. The content-type
on all systems (viewed from Chrome debugger) is text/html;charset=ISO-8859-1
, which I am unsure why. The Dfile.encoding option in all Tomcat configs is set to utf-8
. The meta
tag is also <meta charset="utf-8">
.