1

What is happening to the following HTML snippet when opened in Chrome?:

<p class="MsoNormal" style="margin:0in 0in 0pt"><font face="Calibri" size="3">Attached is a summary of the annual financial report for Company A’s retirement Plan

plans.

When open this page in Chrome and view source I see:

<p class="MsoNormal" style="margin:0in 0in 0pt"><font face="Calibri" size="3">Attached is a summary of the annual financial report for Company A’s retirement Plan

plans.

Notice the replacement of

with

’

I know this is some character encoding issue but my google search reveal little.

JSK NS
  • 3,346
  • 2
  • 25
  • 42
  • 3
    Well yes, it’s a character encoding issue. But since you didn’t show us how you specify the encoding we cannot really help you. – Konrad Rudolph Jun 28 '13 at 12:57
  • possible duplicate of [UTF-8 all the way through](http://stackoverflow.com/questions/279170/utf-8-all-the-way-through) – deceze Jun 28 '13 at 12:58
  • 2
    [Handling Unicode Front To Back In A Web App](http://kunststube.net/frontback/), [What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text](http://kunststube.net/encoding/) – deceze Jun 28 '13 at 12:58
  • I don't know how to make your "’" character on my keyboard, it's not a simple quote "'". What's this goddamn character? – Getz Jun 28 '13 at 13:01
  • possible duplicate of [How do I set Character Encoding to UTF-8 for default.html?](http://stackoverflow.com/questions/905173/how-do-i-set-character-encoding-to-utf-8-for-default-html) – Chris Jun 28 '13 at 13:14
  • 1
    @Getz: It is probably a [Modifier letter apostrophe](http://en.wikipedia.org/wiki/Modifier_letter_apostrophe). The easiest way to get it is the `Ctrl+C` + `Ctrl+V`. (: – Stocki Jun 28 '13 at 13:33
  • @deceze Nice article, however, your example database connection example is misleading because it implies that sending text via the wrong encoding will *always* work – in reality, that’s not true since not every sequence of Latin-1 code points is a valid UTF-8 code point sequence. Now maybe MySQL doesn’t check the validity of the code point sequence at the moment and just passes it through, but this is by no means guaranteed to remain unchanged (and in fact [it doesn’t even work now](http://stackoverflow.com/q/15728456/1968), apparently). – Konrad Rudolph Jun 28 '13 at 13:36
  • @Konrad Are you talking about the Front-to-Back in-detail example of a screwed up string in a database? In that case, maybe you have it backwards? Since MySQL interprets a UTF-8 string incorrectly as Latin-1, it won't complain about anything. Any arbitrary byte sequence is a valid Latin-1 byte sequence. – deceze Jun 28 '13 at 13:44
  • @deceze That’s true. But what happens when the database is filled (correctly) from some other source? I’m not sure what’s going to happen but I suspect that the connection will (a) either choke on any (valid) UTF-8 sequence that cannot be represented in Latin-1 (because it uses code points > 255), or (b) send invalid data. – Konrad Rudolph Jun 28 '13 at 13:51
  • @Konrad Sure, if you're trying to "reduce" UTF-8 text to Latin-1, something bad will happen. I believe MySQL chooses to replace all non-representable characters with "?". If you only fill the database with "Latin-1 data" (correctly or incorrectly), that won't happen though. – deceze Jun 28 '13 at 13:55

1 Answers1

3

Thanks in part to both @deceze and this SO question it looks like I just need this meta tag at the top of the HTML file:

<meta http-equiv="Content-Type" content="text/html;charset=utf-8">
Community
  • 1
  • 1
JSK NS
  • 3,346
  • 2
  • 25
  • 42
  • 4
    Or the much shorter and easier to remember `` (if you are writing HTML5 documents). See [this](http://stackoverflow.com/q/4696499/623518) other SO question. – Chris Jun 28 '13 at 13:14
  • 1
    You should actually rather be setting the Content-Type HTTP header, which takes precedence. – deceze Jun 28 '13 at 13:26
  • Found this one valuable. Thanks! Gave it a +1. – joat Apr 13 '23 at 00:45