3

I want to convert html entities to UTF-8, but mb_convert_encoding destroys already UTF-8 encoded characters. Whats the correct way?

$text = "äöü ä ö ü ß";
var_dump(mb_convert_encoding($text, 'UTF-8', 'HTML-ENTITIES'));
// string(24) "äöü ä ö ü ß"

3 Answers3

6

mb_convert_encoding() isn't the correct function for what you're trying to achieve: you should really be using html_entity_decode() instead, because it will only convert the actual html entities to UTF-8, and won't affect the existing UTF-8 characters in the string.

$text = "äöü ä ö ü ß";
var_dump(html_entity_decode($text, ENT_COMPAT | ENT_HTML401, 'UTF-8'));

which gives

string(18) "äöü ä ö ü ß"

Demo

Mark Baker
  • 209,507
  • 32
  • 346
  • 385
0

In my localhost I get string(18) "äöü ä ö ü ß" .

I think it's something related with your page encoding. Edit the file with Notepad++ and from the toolbar go to encoding and change to 'Encode in ANSI'. If it doesn't work then try with 'Encode in UTF-8 without BOM'.

Alex Coloma
  • 651
  • 3
  • 8
0

and if that still isn't working try this

html_entity_decode($html, ENT_QUOTES, 'cp1252');

This is what was needed on a Windows IIS system for things to start working correctly. see source

MeSo2
  • 450
  • 1
  • 7
  • 18