1

I'm not able to correctly show this char on my web pages. I'm using UTF-8 charset for this page, have I to use ISO-8859-2? I'm getting this a string with this char from a db and on it, it's saved as ě. My Browser show only html tag.

It's the only char (at this moment) that I can't show on my webpage. I've take a look to the http://www.czech.cz and they use UTF-8.

any suggests?

take care! Andrea

Tomas
  • 57,621
  • 49
  • 238
  • 373
Andrea Girardi
  • 4,337
  • 13
  • 69
  • 98

2 Answers2

1

Are you seeing the ě in the browser, or when you view source? If you're seeing it in the browser, then it's probably being double-encoded somewhere -- whatever outputs it to the page is probably detecting it as unencoded HTML and is trying to protect you from some kind of HTML-injection. You'll want to make it not do that. But you have an even deeper problem. If your page is served up in UTF-8, and your data is in UTF-8, there isn't any reason to turn it into an HTML entity in the first place. You should be passing through the UTF-8 data. You do not need to switch to a different character encoding.

rmeador
  • 25,504
  • 18
  • 62
  • 103
  • It's my browser that it's not able to translate the '&@ 283;' code. There is also a problem. I've an admin page to upload on db the string, and before update on db I call $text=htmlentities($text,ENT_QUOTES);. For all other languages all is correct, but not for this char..... – Andrea Girardi Apr 26 '10 at 15:38
  • 1
    Use `htmlspecialchars` **not** `htmlentities`. `htmlentities` tries to encode all non-ASCII characters, which is needless and will corrupt them if you don't tell it the right character set. It defaults to nasty old ISO-8859-1. – bobince Apr 26 '10 at 15:49
1

First of all, yes, you really should be using UTF-8. But that doesn't mean the data you have is already UTF-8 encoded.

Secondly, it sounds like that character is HTML encoded in the database already. This is a problem, because it seems that whatever page is displaying this character also tries to HTML-encode the content as well. Here's an example of what I'm talking about.

Data from user: ě
Data HTML encoded (via htmlentities()) prior to going into DB: ě
Data stored in DB: ě
Data retrieved from DB: ě
Data HTML encoded before being printed to the page: ě
Data as seen in the browser: ě

Do you see that? The character becomes double encoded, so that on the 2nd encoding step the ampersand character is converted into an entity itself.

This is the problem with HTML-encoding data before storing it in the database. That should only be done prior to displaying the content, not prior to storage.

Peter Bailey
  • 105,256
  • 31
  • 182
  • 206
  • You are the man! It's exactly the problem..... So, I've to remove the htmlentities() prior to dong into DB, is it? – Andrea Girardi Apr 26 '10 at 15:44
  • But It's not so clear why on db I've this "Vrchní omítky a vyrovnávac&ia." and it's correctly shown on my browser... – Andrea Girardi Apr 26 '10 at 15:46
  • You should use `htmlspecialchars` when outputting text into the HTML page, and `htmlentities` never. Don't HTML-escape content going into the database. – bobince Apr 26 '10 at 15:50
  • Ok, I've found the problem. The char is coded on DB as ě How can I prevent this? – Andrea Girardi Apr 26 '10 at 15:53
  • I've remove the htmlentities and changed the charset to ISO-8859-1 and it works fine. – Andrea Girardi Apr 26 '10 at 16:03
  • It works only because your content is still HTML-encoded in the database. If you ever repair that data (or remove the step that encodes it) then ISO-8859-1 won't be sufficient. – Peter Bailey Apr 26 '10 at 16:05
  • I've suffered from this issue myself. I'm not sure yet about who's to blame about it but it happens when you need to store a character that is not allowed in the database character set. The only reasonable fix is to change the DB charset to another one with a wider character set, such as UTF-8, or reject the input data when in contains invalid chars. – Álvaro González Apr 26 '10 at 16:35