7

I have set page encoding to UTF-8 in HTML:

meta http-equiv="content-type" content="application/xhtml+xml; charset=UTF-8"

and in HTTP header, I have:

Content-Type    text/html; charset=UTF-8

Why isn't the é shown correctly?


Update:
The data containing the é is crawled from the Internet; the crawler is written in Microsoft .Net. I used MySQL .Net Connector to connect MySQL.

The page to display the é is written in PHP.

A.Alessio
  • 321
  • 2
  • 15
syking
  • 223
  • 2
  • 3
  • 6
  • is your file really encoded in utf-8? try forcing your browser encoding to latin1/iso-8859-1 or change your headers to that. – Mat Jun 25 '11 at 12:31
  • 1
    Where does `é` come from? Database or embedded inside a string inside the PHP file? – Salman A Jun 25 '11 at 12:39
  • hi Salman A, the 'é' comes from database, I use phpmyadmin to check database, no problem, I can see 'é' correctly. After fetch to html page, got the problem. – syking Jun 25 '11 at 14:22

5 Answers5

10

You need to add much more information, but a is usually a sign for a ISO-8859-1 character in data that is treated as UTF-8.

It comes either from

  • The source file claiming to be UTF-8, but actually being saved as ISO-8859-1/Windows-8252 - check your file encoding in your editor or IDE

  • A database connection that uses ISO-8859-1 even though the database tables are UTF-8

Pekka
  • 442,112
  • 142
  • 972
  • 1,088
  • hi Pekka, thanks for your reply. I have set the default charset of mysql connection to utf8. And I saved my html to a local file. open it with notepad++, it tells me the document is 'encode in UTF-8 without BOM'. Does it mean my html page is encoded in UTF-8? – syking Jun 25 '11 at 14:25
  • @syking in that case, your database connection is probably not UTF-8. See [How can I store the '€' symbol in MySQL using PHP?](http://stackoverflow.com/questions/5969583/5969626#5969626) – Pekka Jun 25 '11 at 14:27
  • Oh, sorry, I fogot to mention that my database and tables and fields are all in utf-8 and with utf8_general_ci collation. That's why I cannot figure out by myself. – syking Jun 25 '11 at 14:38
  • 1
    @syking read the linked answer. The *connection* needs to be switched to UTF-8 too. – Pekka Jun 25 '11 at 14:39
2

The most likely explanation is that the page is not encoded using UTF-8, so when the browser tries to decode the text, it is doing so using the wrong encoding.

You need to make sure that the actual document encoding matches the claimed encoding

Quentin
  • 914,110
  • 126
  • 1,211
  • 1,335
1

Had the same problem, even with html settings charset=UTF-8 Just open the html page with Notepad and File-SaveAs with encoding UTF-8 Reopload the page and it should be ok.

Stef

Stef
  • 11
  • 1
1

make sure your file does not have a BOM (byte order mark) at its beginning. i had this problem recently, and even though the file was saved as utf8 (checked several times), the BOM confused firefox and it wrongly displayed umlauts (i had html <meta> tags set to the correct encoding and http headers)

knittl
  • 246,190
  • 53
  • 318
  • 364
0

if you actually use htmlentities() you can decode a specific piece of code like

htmlentities($string, $flags, $charset);

this will encode the string. with the $flags you can somewhat decide what to encode.

this function can be used to prevent sql injections

Although this argument is technically optional, you are highly encouraged to specify the correct value for your code if you are using PHP 5.5 or earlier, or if your default_charset configuration option may be set incorrectly for the given input.

edit: source htmlspecialchars

Adeel
  • 2,901
  • 7
  • 24
  • 34