First off, you need to understand that a character with a diacritic like ó or î (from your example) is not automatically a "utf-8 character". It is simply a character that has different encodings (if any) in different character sets, even in those character sets that have the basic single-byte ASCII part in common (i.e., the English alphabet, the digits, the most common punctuation, and a few more). You could call it a "problematic character", but not a "utf-8 character".
So, when you wrote your footer <div>
, you did NOT write it UTF-8 encoded. Your editor saved those characters in a single-byte encoding, like ISO 8859-1 or one of its relatives.
Browsers normally automatically detect the encoding used in a page, if it is not specified. This is why you were initially able to see in the browser exactly what you had written in your editor.
Then you tried to log in with a "problematic character" in the username. The browser had interpreted your page as having a single-byte encoding, so this caused it to encode your form input the same way, and send it single-byte-encoded back to the server. The PHP code had not been written with this possibility in mind, apparently, because it did not correctly set the third parameter of htmlspecialchars()
, which is "UTF-8"
by default (starting from PHP 5.4.0 - it was "ISO-8859-1"
before). Since a single-byte encoded string with "problematic characters" almost never is a valid UTF-8 string (see my comment to your question, it's the second comment), htmlspecialchars() rejected it.
Then you correctly added the header('Content-Type: text/html; charset=utf-8');
, which disabled the automatic charset detection by the browser. At this point it became evident that your file with the footer <div>
was not UTF-8 encoded (see again my comment for the explanation of the question marks that appear instead of the "problematic characters").
So all you are left to do is convince your editor to save files UTF-8 encoded. As others have noted, saving the file in a different encoding does not work in all editors. Starting from a fresh file is sometimes the solution, maybe after having set the default encoding of your editor to UTF-8.
To check the encoding, you can use the file
command in a shell. Its output should be something like
main.php: PHP script, UTF-8 Unicode text
Or else, you could use the od -tx1z
command, which dumps your file (maybe | less
), as a sequence of hex bytes with the corresponding string on the side. If the file is single-byte encoded, your "problematic characters" will be single bytes >= 0x80. If it is UTF-8 encoded, they will be sequences of 2 bytes (others will be 3 or more bytes), all >= 0x80, while the "non-problematic characters" will continue to be single bytes < 0x80.
The article you mention seems to be well-written, just follow it.
You don't need the AddDefaultCharset
directive in the .htaccess
file, though, if all your pages are generated with the Content-Type: text/html; charset=utf-8
HTTP header, because the effect of the Apache directive is exactly the same (and it is good to keep the control on encoding inside PHP).
Adding the <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
has the same effect, for the browser, as the above HTTP header (note the http-equiv). The HTTP header is cleaner, but this additional meta tag may help in case a page is saved without the header's information.
Most importantly, don't be afraid of UTF-8, because it is your friend!
(...but, from the answer that got your bounty, I see that you, like many people, continue to think that understanding character encodings is too difficult for you ☹)