9

I have a PHP file which has the following text:

<div class="small_italic">This is what you´ll use</div>

On one server, it appears as:

This is what you´ll use

And on another, as:

This is what you�ll use

Why would there be a difference and what can I do to make it appear properly (as an apostrophe)?


Note to all (for future reference)

I implemented Gordon's / Gumbo's suggestion, except I implemented it on a server level rather than the application level. Note that (a) I had to restart the Apache server and more importantly, (b) I had to replace the existing "bad data" with the corrected data in the right encoding.

/etc/php.ini

default_charset = "iso-8859-1"

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
siliconpi
  • 8,105
  • 18
  • 69
  • 107
  • 8
    Seems like an encoding problem to me. Anyway, why don't you use `'` instead of `´` like everyone else ? :D – Shikiryu Nov 04 '10 at 08:56
  • 2
    check the charset of the returned document (headers) there may be the explanation, in any case you can just use `’` – Hannes Nov 04 '10 at 08:57
  • 6
    And I'll add : _This smells like Word copy/paste_ – Shikiryu Nov 04 '10 at 09:01
  • 1
    Its not an ASCII apostrophe x'27'. Its probably a windows "left single quote" x'92' which is supported only in MS code pages. – James Anderson Nov 04 '10 at 09:05
  • 1
    @Chouchenos: Yes, `´` (U+00B4, ACUTE ACCENT) is obviously the wrong character. I guess he rather meant `’` (U+2019, RIGHT SINGLE QUOTATION MARK) that would be the proper typographical apostrophe. – Gumbo Nov 04 '10 at 09:22
  • 3
    In addition to the specific advice for this problem, I'd always recommend you read Joel Spolsky's [The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)](http://www.joelonsoftware.com/articles/Unicode.html) if you haven't already :) – Matt Gibson Nov 04 '10 at 10:46
  • Another reason for this replacement of *"U+2019 E2 80 99 RIGHT SINGLE QUOTATION MARK"* with *"U+FFFD EF BF BD REPLACEMENT CHARACTER"*, for PHP applications using databases (e.g. MySQL), is [a missing "charset=utf8" in the "`new PDO`" line](https://stackoverflow.com/questions/4475548/pdo-mysql-and-broken-utf-8-encoding/21373793#21373793). – Peter Mortensen Jul 08 '19 at 16:08

7 Answers7

16

You have to make sure the content is served with the proper character set:

Either send the content with a header that includes

<?php header("Content-Type: text/html; charset=[your charset]"); ?>

or - if the HTTP charset headers don't exist - insert a <META> element into the <head>:

<meta http-equiv="Content-Type" content="text/html; charset=[your charset]" />

Like the attribute name suggests, http-equiv is the equivalent of an HTTP response header and user agents should use them in case the corresponding HTTP headers are not set.

Like Hannes already suggested in the comments to the question, you can look at the headers returned by your webserver to see which encoding it serves. There is likely a discrepancy between the two servers. So change the [your charset] part above to that of the "working" server.

For a more elaborate explanation about the why, see Gumbo's answer.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Gordon
  • 312,688
  • 75
  • 539
  • 559
  • How do you know his document is in UTF-8? – RoToRa Nov 04 '10 at 09:01
  • 2
    More important: The data does not seem to be encoded in UTF-8. – Gumbo Nov 04 '10 at 09:33
  • @Gordon: I’d rather like to see the suggestions in the [proper order](http://www.w3.org/TR/html4/charset.html#h-5.2.2): HTTP first, then HTML (and only if there was no encoding specified in HTTP). – Gumbo Nov 04 '10 at 09:53
  • @Gumbo okay, changed the order, though I am pretty sure having the META does no harm at all. It's called http-equiv for a reason. User agents should use it when there is no equivalent http header. – Gordon Nov 04 '10 at 09:59
  • @Gordon: Yes, if the character encoding is specified in HTTP the META should not have any effect at all. And that’s the reason for why META is insufficient in some cases. – Gumbo Nov 04 '10 at 10:04
  • @Gumbo some cases is not all cases :) It's a fallback. Also, having meta information for a document in the document - even if they are not used - keeps the document complete. – Gordon Nov 04 '10 at 10:12
  • @Gordon, @Gumbo - there is no difference in my app on the two servers, but Apache / PHP might be different - what should I check (in phpinfo?) and get to match so that both work similarly... – siliconpi Nov 08 '10 at 11:51
  • @matt please compare the Response header's your Apache webservers will return when requesting the pages. You can do this with Firebug for Firefox or Fiddler for IE. – Gordon Nov 08 '10 at 12:02
  • @Gordon - My code outputs but phpinfo() on the second server says: "HTTP Response Headers Content-Type text/html; charset=UTF-8 " – siliconpi Nov 08 '10 at 12:19
  • @matt and what does it say on the other server? And which one is the one that does show the correct response? – Gordon Nov 08 '10 at 12:23
  • @Gordon - the phpinfo() doesnt display anything, but the firebug investigation on a page shows that the "Request Headers" is "Accept-Language en-us,en;q=0.5 Accept-Encoding gzip,deflate Accept-Charset ISO-8859-1,utf-8;q=0.7,*;q=0.7" and the "Response Headers" is "Vary Accept-Encoding,User-Agent Content-Encoding gzip" – siliconpi Nov 08 '10 at 12:30
  • @matt and the � appear on the server that responds with UTF-8, right? It's probably like we suggested already then. One server sends encoding information, while the other doesnt. The one that sends the UTF-8 encoding will overwrite the ISO-8859-1 set via META, because META is just a fallback. Try to overwrite the header with PHP as shown above. – Gordon Nov 08 '10 at 12:39
  • @Gordon, switching to iso-8859-1 in php.ini and restarting apache led to some characters appearing correctly, but others appearing as Quiénes somos, Explóra – siliconpi Nov 08 '10 at 12:54
  • @matt Please do as suggested above and in the answer. Send the header. Also, did you check if the characters that are wrong then are actually part of ISO-8859-1? If not, they have to be encoded to their respective HTML entities. – Gordon Nov 08 '10 at 12:56
8

The display of the REPLACEMENT CHARACTER (U+FFFD) most likely means that you’re specifying your output to be Unicode but your data isn’t.

In this case, if the ACUTE ACCENT ´ is for example encoded using ISO 8859-1, it’s encoded with the byte sequence 0xB4 as that’s the code point of that character in ISO 8859-1. But that byte sequence is illegal in a Unicode encoding like UTF-8. In that case the replacement character U+FFFD is shown.

So to fix this, make sure that you’re specifying the character encoding properly according to your actual one (or vice versa).

Gumbo
  • 643,351
  • 109
  • 780
  • 844
1

The simple solution is to use ASCII code for special characters.

The value of the apostrophe character in ASCII is &#8217;. Try putting this value in your HTML, and it should work properly for you.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Zain Shaikh
  • 6,013
  • 6
  • 41
  • 66
  • ASCII has only one apostrophe character and that’s at 0x27. The character reference `’` refers to the character U+2019 *RIGHT SINGLE QUOTATION MARK* in Unicode. – Gumbo Nov 04 '10 at 09:44
  • @Gumbo: RIGHT SINGLE QUOTATION MARK **is** the correct character for an apostrophe: http://www.languagegeek.com/typography/apostrophes.html – RoToRa Nov 04 '10 at 09:51
  • I assume he meant that `’` was a pure ascii string rather than the character it represented was ascii. – Chris Nov 04 '10 at 09:52
  • @RoToRa: I was rather trying to point out that US-ASCII only has one apostrophe character and that character references refer to characters in Unicode. And besides that, U+2019 is not the proper typographical apostrophe in every language. But yes, it is for English. – Gumbo Nov 04 '10 at 09:58
1

To sum it maybe up a little bit:

  1. Make sure the FILE saved on the web server has the right encoding
  2. Make sure the web server also delivers it with the right encoding
  3. Make sure the HTML meta tags is set to the right encoding
  4. Make sure to use "standard" special chars, i.e. use the ' instead of ´of you want to write something like "Luke Skywalker's code"

For encoding, UTF-8 might be good for you.

If this answer helps, please mark as correct or vote for it. THX

Czar
  • 356
  • 1
  • 4
  • 21
0

Set your browser's character set to a defined value:

For example,

<meta http-equiv="content-type" content="text/html; charset=utf-8" />
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Michel
  • 11
0

This is probably being caused by the data you're inserting into the page with PHP being in a different character encoding from the page itself (the most common iteration is one being Latin 1 and the other UTF-8).

Check the encoding being used for the page, and for your database. Chances are there will be a mismatch.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
GordonM
  • 31,179
  • 15
  • 87
  • 129
-1
  1. Create an .htaccess file in the root directory:

    AddDefaultCharset utf-8
    AddCharset utf-8 *
    <IfModule mod_charset.c>
        CharsetSourceEnc utf-8
        CharsetDefault utf-8
    </IfModule>
    
  2. <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Alex Pliutau
  • 21,392
  • 27
  • 113
  • 143