-2

I have a database that stores some unusual characters input by visitors: e.g. é, á, , í , and ú.

My html 5 web page displays ? instead of the character when I use

<meta charset="utf-8">

but when I use

<meta charset="ISO-8859-1">

the characters are displayed correctly.

However, when I specify the latter charset the W3C validator spits out an error message:

Error: Bad value ISO-8859-1 for attribute charset on element meta: iso-8859-1

Is there a way to get the characters displaying correctly and get a W3C validation, or am I expecting too much?

Regards

Tog

The suggested "already answered" question does not apply because: 1) My php version is 5.4, not 5.5 (2) I do not understand the answer which seems to be aimed at people who have a greater depth of knowledge than me.

Alastair McCormack
  • 26,573
  • 8
  • 77
  • 100
Tog Porter
  • 421
  • 1
  • 7
  • 23
  • Uhm... is the full error *"iso-8859-1 is not a preferred encoding name. The preferred label for this encoding is windows-1252."* by any chance...!? – deceze Feb 29 '16 at 13:41
  • How are you writing the content of the response? (Ie. the encoding you use there needs to match the encoding you've said you are using.) UTF-8 can encode any Unicode character, and those common accented characters are definitely in Unicode. – Richard Feb 29 '16 at 13:43
  • 1
    Possible duplicate of [UTF-8 all the way through](http://stackoverflow.com/questions/279170/utf-8-all-the-way-through) – deceze Feb 29 '16 at 13:53
  • @Richard Where do I find the encoding of the content of response? I am sorry, I am getting old (64) and I do not understand the terminology. There is no other charset mentioned in the page. – Tog Porter Feb 29 '16 at 14:25
  • 2
    1) The PHP version is irrelevant to the applicability of the other question. 2) Since the characters display correctly when you tag your page as ISO-8859, that means your data is ISO-8859 encoded. 3) If you need a gentle introduction, start here: [Handling Unicode Front To Back In A Web App](http://kunststube.net/frontback/), [What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text](http://kunststube.net/encoding/) – deceze Feb 29 '16 at 14:27
  • @deceeze Yes, that is the full error, but although it was lengthy and used terms I do not understand, I could not find anything in the previous answer you mentioned that made any sense to me. – Tog Porter Feb 29 '16 at 14:28
  • The message says *"Instead of using ``, you should be using ``"* (insert messed up historical reasons here for why one is regarded as preferred vs. the other). – deceze Feb 29 '16 at 14:35
  • The above would be a quick fix, but really you should get familiar with "encodings" and how to handle them correctly. See the above two articles for an introduction, which would then hopefully help you understand the duplicate question/answer better. – deceze Feb 29 '16 at 14:36

2 Answers2

2

OK I think I have the answer now. Thanks to deceze for pointing me to: http://kunststube.net/frontback/

I first checked the database and the fields are set to a collation of; utf8_general_ci, which I presume is correct.

I now have:

<meta charset="utf-8">

at the top of the page and the dbase connection is now:

$dbh = new PDO("mysql:host=$hostname;dbname=$dbname; charset=utf8;", $username, $password);

Adding the charset in there appears to have fixed the problem and the characters now display correctly, whilst the page passes W3C validation.

Many thanks for the help.

Tog

Tog Porter
  • 421
  • 1
  • 7
  • 23
  • Old age is a terrible thing. I had the same problem on a different site today and had forgotten the answer from 2 years ago. Just as well this came up in a Google search for the solution and it was only after fixing it that i realised the original question was my own :-) – Tog Porter Mar 17 '18 at 14:28
0

You are looking for special characters often described as "HTML entities". Specifically you are looking for ISO-8859-1 HTML entities. You can display each of these characters like this without giving up your UTF-8 encoding.

To display "é" use &#233; or &eacute;

To display "á" use &#225; or &aacute;

To display "í" use &#237; or &iacute;

To display "ú" use &#250; or &uacute;

In each of these cases you write &...; in place of the letter, including the semicolon. So to write the word "éclair" you would use &eacute;clair.

For more symbols you can find a pretty complete reference here.

William Rosenbloom
  • 2,506
  • 1
  • 14
  • 37
  • Why would you want to do this when it's perfectly possible to write the characters as is in plaintext; when the only thing you need to pay a little attention to is to handle encodings correctly?! – deceze Feb 29 '16 at 13:52
  • @deceze because there are many advantages to UTF-8 and he's already using UTF-8 so I gave him a way he wouldn't have to change that. Also UTF-8 has a lot of network transmission advantages so I prefer to keep it and do things this way. I think it's pretty rude to downvote my answer just because you, personally, wouldn't do it this way. – William Rosenbloom Feb 29 '16 at 13:56
  • Well, apparently the OP is ***not*** using UTF-8 correctly, or the characters would show up correctly on their site. Isn't it a better idea to help them fix their encoding handling, rather than suggesting this alternative? Also, OP says the data is coming from the database... Are you suggesting they should retype all their data in the database to use HTML entities? – deceze Feb 29 '16 at 13:59
  • @William the problem has now been fixed by the insertion of a charset into the dbase connection string :-) – Tog Porter Feb 29 '16 at 15:05