39

I have the standard XAMPP installation on win7 (x64). Having had my share of encoding troubles in a past project where mysql encoding did not match with the php enconding which in turn sometimes output html in other encodings, I decided to consistently encode everything using utf-8.

I'm just getting started with the html markup and am allready experiencing troubles.

  • My page is saved using utf-8 (no BOM, I think)
    //update: It turns out this was NOT the case. The file was actually saved with ISO_8859-1. I later found this out thanks to Sherm Pendleys answer. I had to go back and change my project settings (which were set to "ISO-8859-1") to the desired "UTF-8".
  • php is set per .htaccess to serve .php-pages in utf-8 with: AddCharset UTF-8 .php
  • html has a meta tag specifying: <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
  • To test I set used php header('Content-Type:text/html; charset=UTF-8');

The page is evidently served in utf-8 (firefox and chrome recognize it as such) but any special characters such as é, á or ¡ will just show as . Also when viewing the source code.

When dropping the encoding settings mentioned above all characters are rendered correctly but the encoding that is detected shows either windows-1252 or ISO-8859-1 depending on the browser.

How come? I'm very puzzled. I would have expected the exact opposite behavior.
Any advice is welcome, thanks!

edit: Hopefully this helps a bit more. This is the response header (as per firebug)

HTTP/1.1 200 OK
Date: Sat, 26 Mar 2011 20:49:44 GMT
Server: Apache/2.2.14 (Win32) DAV/2 mod_ssl/2.2.14 OpenSSL/0.9.8l mod_autoindex_color PHP/5.3.1 mod_apreq2-20090110/2.7.1 mod_perl/2.0.4 Perl/v5.10.1
X-Powered-By: PHP/5.3.1
Content-Length: 91
Keep-Alive: timeout=5, max=99
Connection: Keep-Alive
Content-Type: text/html; charset=utf-8
leugim
  • 564
  • 1
  • 5
  • 14

7 Answers7

27

When [dropping] the encoding settings mentioned above all characters [are rendered] correctly but the encoding that is detected shows either windows-1252 or ISO-8859-1 depending on the browser.

Then that's what you're really sending. None of the encoding settings in your bullet list will actually modify your output in any way; all they do is tell the browser what encoding to assume when interpreting what you send. That's why you're getting those �s - you're telling the browser that what you're sending is UTF-8, but it's really ISO-8859-1.

Sherm Pendley
  • 13,556
  • 3
  • 45
  • 57
  • 5
    Exactly! You're telling the server to say "charset=utf-8" what that isn't actually the case. Servers and browsers are trusting creatures. The server will say whatever you tell it to, and the browser believes whatever the server says about the encoding - even if it's not true. – Sherm Pendley Mar 26 '11 at 21:26
  • 2
    As far as *where* the non-utf-8 data is coming from, there's no way to tell from what you've posted here. If a simple 'echo("föö");" displays as f��, that could indicate that your .php files are saved as iso-8859-1. If the data is being fetched from a database, you could use mb_detect_encoding() to verify its encoding. – Sherm Pendley Mar 26 '11 at 21:29
  • 4
    You where right and pointed me to the real culprit. Thanks! It seems the file was not saved as utf-8 I had to go back to the environment where it was created. There, hidden under "project settings" was an option set at "ISO-8859-1". Changing this option to "UTF-8" and saving the files again got me the desired results. No need to set `header();` nor modifications in apaches config or *.htaccess*. A silly mistake, really. Both annoying and gratifying. Thank you Sherm for your answer and comments! – leugim Mar 26 '11 at 21:33
  • @ShermPendley : Can you please have a look at [this](http://stackoverflow.com/questions/23212974/html-special-characters-converted-to-question-mark-in-chrome-mozilla) where the OP was able to get the characters correctly in IE but neither in Chrome nor in Firefox. – Nidheesh Apr 23 '14 at 05:03
16

In my case, database returned latin1, when my browser expected utf8.

So for MySQLi I did:

 mysqli_set_charset($dblink, "utf8");    

See http://php.net/manual/en/mysqli.set-charset.php for more info

Dennis
  • 7,907
  • 11
  • 65
  • 115
9

Tell PDO your charset initially.... something like

PDO("mysql:host=$host;dbname=$DB_name;charset=utf8;", $username, $password);

Notice the: charset=utf8; part.

hope it helps!

ErickBest
  • 4,586
  • 5
  • 31
  • 43
7

Check if any of your .php files which printing some text, also is correctly encoding in utf-8.

IProblemFactory
  • 9,551
  • 8
  • 50
  • 66
  • I'm not sure if I understand correctly, but a simple `echo(é);` also outputs as that damned question mark. I updated my answer with the header information. – leugim Mar 26 '11 at 20:52
  • 1
    I mean case, when you have one base php file (utf-8) and include into its some other php file without good encoding – IProblemFactory Mar 26 '11 at 20:57
  • Ah! ok at the momento it is the most simple case possible: one php with a simple echo and some basic html markup with text containing these characters. I'm not including anything, yet. besides the mentioned encoding options, that is. – leugim Mar 26 '11 at 21:01
  • 1
    Sometimes you're fetching a .csv file, and the encoding of the file itself is wrong. Open the .csv or whatever text-based data file, choose the appropriate encoding (Textmate 2 has a preview mode on opening) and use "save as...", to make a copy choosing "UTF-8" as the encoding. Then problems should be gone on php 5.3 and above. – Christian Bonato Jun 09 '17 at 13:53
3

Looks like nobody mentioned

SET NAMES utf8;

I found this solution here and it helped me. How to apply it:

To be all UTF-8, issue the following statement just after you’ve made the connection to the database server: SET NAMES utf8;

Maybe this will help someone.

Vitalius
  • 127
  • 3
  • 8
  • Please see as well: [Whether to use “SET NAMES”](http://stackoverflow.com/questions/1650591/whether-to-use-set-names) – hakre Aug 06 '12 at 11:18
  • Thanks for the information and sorry for late reply. I had always used mysql_set_charset() but there was one project I had to make adjustments on, and for some reason this function didn't work. I tried various other methods (even modified httpd.conf and php.ini on my local machine) but only `SET NAMES utf8` helped me. BTW, use of mysql_set_charset() is discouraged now: http://php.net/manual/en/function.mysql-set-charset.php – Vitalius Aug 09 '12 at 12:26
  • Yes it depended on MySQL and PHP versions so this never was that straight forward as it is today. Luckily things did improve. – hakre Aug 09 '12 at 12:35
  • This helped me a lot. I was struggling for days on it. – Zahari Kitanov Dec 12 '17 at 21:20
3

I'm from Brazil and I create my data bases using latin1_spanish_ci. For the html and everything else I use:

charset=ISO-8859-1

The data goes right with é,ã and ç... Sometimes I have to put the texts of the html using the code of it, such as:

Ol&aacute;

gives me

Olá

You can find the codes in this page: http://www.ascii.cl/htmlcodes.htm

Hope this helps. I remember it was REALLY annoying.

SPL_Splinter
  • 483
  • 1
  • 5
  • 16
  • Thanks for your answer! I would like to find a way not to encode the characters. I thought of using utf-8 because I read that it encompasses all possible characters... I could solve this issue by switching to ISO-8859 but would rather like to have some light shed on this. – leugim Mar 26 '11 at 20:58
  • start over with the simplest configuration possible but dont start escaping your characters with html. – tetris Mar 26 '11 at 21:20
2

The problem is the charset that is being used by apache to serve the pages. I work with Linux, so I don't know anything about XAMPP. I had the same problem too, what I did to solve the problem was to add the charset to the charset config file (It is commented by default).

In my case I have it in /etc/apache2/conf.d/charset but, since you're using Windows the location is different. So I'm giving you this like an idea of how to solve it.

At the end, my charset config file is like this:

# Read the documentation before enabling AddDefaultCharset.
# In general, it is only a good idea if you know that all your files
# have this encoding. It will override any encoding given in the files
# in meta http-equiv or xml encoding tags.

AddDefaultCharset UTF-8

I hope it helps.

emco
  • 4,589
  • 3
  • 18
  • 20
  • He already did that - it's the second item in the above bullet list. And it wouldn't help anyway; if what you're sending is ISO-8859-1, then specifying "charset=utf-8" in the HTTP headers only serves to confuse the browser, causing it to display those �s. – Sherm Pendley Mar 26 '11 at 21:35