5

I want to output the following string in PHP:

ä ö ü ß €

Therefore, I've encoded it to utf8 manually:

ä ö ü ß €

So my script is:

<?php
header('content-type: text/html; charset=utf-8');
echo 'ä ö ü ß €';
?>

The first 4 characters are correct (ä ö ü ß) but unfortunately the € sign isn't correct:

ä ö ü ß

Here you can see it.

Can you tell me what I've done wrong? My editor (Notepad++) has settings for Encoding (Ansi/UTF-8) and Format (Windows/Unix). Do I have to change them?

I hope you can help me. Thanks in advance!

caw
  • 30,999
  • 61
  • 181
  • 291
  • 3
    You should use an editor that supports UTF-8. What you did is just use ISO 8859-1 to write the code words of UTF-8. Using UTF-8 you could write `ä ö ü ß €` directly. – Gumbo Sep 07 '09 at 10:27
  • 1
    Ah, sorry, it’s Windows-1252 instead of ISO 8895-1 – Gumbo Sep 07 '09 at 10:29
  • Careful, though, that using UTF-8 might insert U+FEFF at the beginning of the file. And PHP doesn't like that at all. – Joey Sep 07 '09 at 10:35
  • @Johannes -- I've never had problems with this. What problems with UTF-8 encoded files did you have? – warpech Sep 07 '09 at 10:59
  • @warpech / @Johannes Rössel: That is the Byte Order Mark (BOM). Here is more about it: http://www.decodeunicode.org/de/U+FEFF In Notepad++ you can choose "UTF-8 without BOM" as the encoding and you won't have problems with it. – caw Sep 07 '09 at 13:36

7 Answers7

8

That last character just isn't in the file (try viewing the source), which is why you don't see it.

I think you might be better off saving the PHP file as UTF-8 (in Notepad++ that options is available in Format -> Encode in UTF-8 without BOM), and inserting the actual characters in your PHP file (i.e. in Notepad++), rather than hacking around with inserting à everywhere. You may find Windows Character Map useful for inserting unicode characters.

Dominic Rodger
  • 97,747
  • 36
  • 197
  • 212
5

The Euro sign (U+20AC) is encoded in UTF-8 with three bytes, not two. This can be seen here. So your encoding is simply wrong.

Joey
  • 344,408
  • 85
  • 689
  • 683
  • Thanks, that seems to be the cause. :) – caw Sep 07 '09 at 13:37
  • 1
    It's not uncommon for whatever handles text to drop invalid byte sequences from the input. So when you advertise something as UTF-8 and include invalid UTF-8 then don't expect it to be there. – Joey Sep 07 '09 at 14:30
4

If you want to output it properly to utf8, your script should be:

<?php
header('content-type: text/html; charset=utf-8');
echo "\xc3\xa4"."\xc3\xb6"."\xc3\xbc"."\xc3\x9f"."\xe2\x82\xac";
?>

That way even if your php script is saved to a non-utf-8 encoding, it will still work.

velcrow
  • 6,336
  • 4
  • 29
  • 21
  • Thanks. What does the echo line do exactly? – caw Aug 18 '11 at 17:30
  • It echos each of the following characters encoded in utf8: ä ö ü ß €. In your original question you said "I've encoded it to utf8 manually". To truly do that, go to http://www.utf8-chartable.de/ and search for ä, you'll see in utf8 it is "\xc3\xa4"; – velcrow Aug 25 '11 at 22:38
2

You should always set your editor to the same encoding that the generated HTML instructs the browser to use. If the HTML page is intended to be interpreted as UTF-8, then set your text editor to UTF-8. PHP is completely unaware of the encoding settings of the editor used to create the file; it treats strings as a stream of bytes.

In other words, as long as the right bytes are in the file, everything will work. And the easiest way to ensure the right bytes are in the file, is to set your encoding to the same one the web page is supposed to be in. Anything else just makes life more difficult than it needs to be.

But the best defence is to leave non-ASCII characters out of the code completely. You can pull them out of a database or localisation file instead. This means the code can be modified in essentially any editor without worrying about damaging the encoding.

Artelius
  • 48,337
  • 13
  • 89
  • 105
0
header('Content-Type: text/html; charset=UTF-8');

This just informs the browsers what kind of content you're going to send it and how it should treat it. It does not set the encoding of the actual content you're sending. It's completely up to you to fulfil your own promise. Your content is not going to magically transform from whatever to UTF-8 just because you set that header. If you tell the browser to treat the content as UTF-8, but you're sending it Latin-1 encoded data, of course it will break.

I refer you to What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text

vimal1083
  • 8,499
  • 6
  • 34
  • 50
0

this worked for me

    if (mb_check_encoding($value, 'UTF-8')) {
      return $value = utf8_encode($value);  
    }  
    else  {
      return $value;
    }

Source : https://github.com/jdorn/php-reports/issues/100

Djomla
  • 620
  • 2
  • 7
  • 18
0

Try this it works for me. This code will change ã¶ to ö

<?php

header('Content-Type: text/html; charset=UTF-8');
echo $category = 'Computer & Zubehör';
exit;

?>

Result: Computer & Zubehör

Solomon Suraj
  • 1,162
  • 8
  • 8