0

I have a problem with UTF-8 strings in PHP on my Debian server.

Update in details

I´ve done a little more testing and the situation is now more specific. I updated the title and details to fit it better the situation. Thanks for the responses and sorry that the problem wasn´t described clearly. The following script works fine on my local Windows machine but not on my Debian server:

<?php
header("Content-Type: text/html; charset=UTF-8");
$string = '<html><head></head><body>UTF-8: ÄÖÜ<br /></body</html>';
$document = new DOMDocument();
@$document->loadHTML($string);
echo $document->saveHTML();
echo $string;

As expected on my local machine the output is:

UTF-8: ÄÖÜ
UTF-8: ÄÖÜ

On my server the output is:

UTF-8: ÄÖÜ
UTF-8: ÄÖÜ

I wrote the script in Notepad++ in UTF-8 without BOM and transferred it over SSH. As noticed by guido the string itself is properly UTF-8 encoded. There seems to be a problem with PHP DOM or maybe libxml. And the reason must be some setting since it is machine dependant.

Original question

I work locally with XAMPP on Windows and everything is fine. But when I deploy my project on the server UTF-8 strings get all messed up. In fact when I upload this test script

echo utf8_encode('UTF-8 test: ÄÖÜ');

I get "ÃÃÃ". Also when I connect with putty to the server I cannot write umlauts (ÄÖÜ) correctly in the shell. I have no idea if this issue is even PHP related.

Alex Lawrence
  • 1,160
  • 3
  • 10
  • 19

6 Answers6

2

Check for your apache's AddDefaultCharset setting.

On standard debian apache distributions, the setting can be modified in /etc/apache2/conf.d/charset.

Linus Kleen
  • 33,871
  • 11
  • 91
  • 99
1

Please verify that your file is byte-to-byte the same as on your local machine. FTP transfer in text mode could have messed it up. You may want to try binary one.

BarsMonster
  • 6,483
  • 2
  • 34
  • 47
  • The transfer is by SVN (commit and server side export). I think it should be byte-to-byte the same. Or could SVN be messing anything up? – Alex Lawrence Jan 18 '11 at 13:13
  • Less likely, but still always worth checking. – BarsMonster Jan 18 '11 at 15:04
  • As said to guido I am calling a bash script from putty to deploy the project from SVN to the web directory. Could my locale or connection settings (which may be messed up) be affecting the SVN export? – Alex Lawrence Jan 18 '11 at 21:18
1

EDIT: answer for updated question:

<?php
header("Content-Type: text/html; charset=UTF-8");
$string = '<html><head>'
.'<meta http-equiv="content-type" content="text/html; charset=utf-8">'
.'</head><body>UTF-8: ÄÖÜ<br /></body</html>';
$document = new DOMDocument();
@$document->loadHTML($string);
echo $document->saveHTML();
echo $string;
?>

I suspect your input string may be already UTF-8. Try:

setlocale(LC_CTYPE, 'de_DE.UTF-8');
$s = "UTF-8 test: ÄÖÜ";
if (mb_detect_encoding($s, "UTF-8") == "UTF-8") {
    echo "No need to encode";
} else {
    $s = utf8_encode($s);
    echo "Encoded string $s";
}
guido
  • 18,864
  • 6
  • 70
  • 95
  • It says "no need to encode" but after setting `setlocale(LC_CTYPE, 'de_DE.UTF-8');` the test script output is correct. Is this a server issue? How can I set my locale in general to UTF-8? – Alex Lawrence Jan 18 '11 at 13:19
  • @Alex Lawrence if you edited the script in a debian shell, i'm pretty sure you input UTF-8 characters (verify this logging on with putty and typing *echo $LANG* ) as shells have default UTF8 encodings on linux since ages; i dont know about windows, i guess it's still using CP-1252 – guido Jan 18 '11 at 13:26
  • Echo of $LANG is `de_DE.UTF-8`. I don´t know how to use vim so I used nano for the test script. If I paste your code the characters aren´t displayed correctly in nano but the output of the browser is correct. I do not seem to get what the problem is. I just do a SVN export from my project using the terminal and UTF-8 strings in PHP get all messed up. :( – Alex Lawrence Jan 18 '11 at 21:11
  • @Alex Lawrence what if you edit my script above in your favorite editor in your local windows system and run it? What editor do you use? Check if it has something like "file encoding" among the settings. What if you copy your source file to the server using winscp (or similar)? – guido Jan 19 '11 at 00:06
0

Try changing the defualt charset on the server in your php.ini file:

default_charset = "UTF-8"

also, make sure your are sending out the proper content type headers as utf-8

In my experience with utf-8, if you properly configure the php mbstring module and use the mbstring functions, and also make sure your database connection is using utf-8 then you won't have any problems.

The db part can be done for mysql with the query "SET NAMES 'utf8'"

I usually started an output buffer using mbstring to handle the buffer. This is what I use in production websites and it is a very solid approach. Then send the buffer when you have finished rendering your content.

Let me know if you would like the sampe code for that.

Another easy trick to just see if it is the wrong headers being sent out by php or the webserver is to use the view->encoding menu on your browser and see if it is utf-8. If it's not and you switch it to utf-8 and everything looks ok then it is a problem with your headers or content type. If it is already utf-8 and the text is screwed up then it is something going wrong in your code or db connection. If you are using mysql make sure the tables and columns involved are also utf-8

Tom Gruner
  • 9,635
  • 1
  • 20
  • 26
0

Are you explicitly sending a content-type header? If you omit it, it's likely that Apache is sending one for you. If the file is served with a Latin-1 encoding (by Apache) and the browser reads it as such, then your UTF-8 characters will be malformed.

Try this:

<?php
echo "Drop some UTF-8 characters here.";

Then this:

<?php
header("Content-Type: text/html; charset=UTF-8");
echo "Drop some UTF-8 characters here.";

The second should work, if the first doesn't. You may also want to save the file as a UTF-8-encoded file, if it's not already.

If your database characters are messed up, try setting the (My)SQL connection encoding.

RickN
  • 12,537
  • 4
  • 24
  • 28
0

The cause of the problem was an old version of libxml (2.6.32.) on the server. On the development machine it was 2.7.3. I upgraded libxml to an unstable package resulting in version 2.7.8. The problems are now gone.

Alex Lawrence
  • 1,160
  • 3
  • 10
  • 19
  • Sorry for the wrong scope of the question and for answering it myself in the end. Thanks especially to guido for the hints. Unfortunately this problem was related to libxml. – Alex Lawrence Jan 24 '11 at 20:08