5

I'm serving an embeddable <script> that users can copy/paste into their websites and have content displayed.

The script loads a stylesheet and renders some HTML that is injected into the host page.

I'm facing problems displaying special characters (ü,ö,ä, you name it) when the host pages are encoded in encodings different from my script (which is encoded in UTF-8) like ISO-8559-1. Special characters will be garbled.

Content is injected like:

var content = template.render(model);
$('#some-el').html(content);

The same problem goes for content that is generated via CSS pseudos like:

.some-class::after{
  content: 'Ümläüts äré fün';
}

My solution to the problem right now is converting all Umlauts into entities (&uuml; for HTML, \00FC for CSS) when precompiling my templates (Mustache that is compiled via hogan.js) and CSS in the build step. This is working, but feels very cumbersome and easy to break.

What are the factors in play that determine the encoding of content generated by JavaScript? Is there a way to have the host site "respect" my script output's encoding? Might this be due to some server misconfiguration?

m90
  • 11,434
  • 13
  • 62
  • 112

3 Answers3

1

The conversion of all special characters to entities is the way it's meant to be done.

Did you save your UTF-8 under UTF-8 structure?

To change the encoding of your document, however, it is not enough to just change the encoding declaration at the top of the page or on the server. You need to re-save your document in that encoding.

Sources:

Community
  • 1
  • 1
Anima-t3d
  • 3,431
  • 6
  • 38
  • 56
1

I am not quite sure why you feel escaping is cumbersome ...

For HTML you can escape all characters with codes greater than 127 (pseudocode):

uint code = ...
if( code < ' '|| code > 127 ) {  
  print("&#"); 
  print(toString(code)); 
  print(";"); 
} else {
  print(code); 
}

This will escape all non-ascii characters.

And pretty much the same is for CSS. Such symbols in CSS can appear only in string literals or comments so you can simply escape all non-ascii's in CSS files without parsing CSS structure.

All this is quite reliable I think.

c-smile
  • 26,734
  • 7
  • 59
  • 86
  • This seems to work great, thanks. I was using a stupid dictionary before so that was the part that felt easy to break! – m90 Jan 08 '14 at 12:23
  • Built a module using this approach for anyone who's interested, works great for me: https://github.com/m90/entity-convert – m90 Jun 07 '14 at 12:25
0

Have you tried the content encoding in CSS,

Define encoding in beginning of CSS file

@charset "UTF-8";

although this is obsolete in HTML5, in HTML 5 browser this should not be problem, Note: if browser is configured to override character set, there are controversial techniques available that I will not like to discuss here.

For External JavaScript file define the Encoding for External JavaScript.. like..

<script src="myscripts.js" charset="UTF-8"></script>

as your files is called on client side you can not force, but you can insist one meta tag..

like..

<meta charset='utf-8'>

this will kill most of the issues..

MarmiK
  • 5,639
  • 6
  • 40
  • 49
  • I already tried all of this, there are still situations that this will fail in. – m90 Jan 08 '14 at 07:49
  • Then use this Basic JavaScript `document.getElementById('#some-el').innerText = "your content";` this should work.. – MarmiK Jan 08 '14 at 08:47