0

I have a database table holding products which is set to latin1 - default collation which can't be changed

I've written a script which uses DOMDocument to add a new li to an existing ul in the product description

The code if anyone is interested is

$current_description = '
    <ul>
        <li>Here is some&nbsp;broken encoding – </li>
    </ul>
';
$dom = new DOMDocument;
$dom->loadHTML($current_description);
$ul = $dom->getElementsByTagName('ul')->item(0);
$li = $dom->createElement('li', 'Content');
$ul->appendChild($li);
echo $dom->saveHTML($dom->documentElement);

I'm facing problems with encoding on the output, examples below

&nbsp becomes Â
– becomes –

I have searched for a solution but can't find one that works

I've tried mb_convert_encoding with varying parameters without any luck

E.g.

$current_description = mb_convert_encoding($current_description, 'utf-8', mb_detect_encoding($current_description));

Anyone have any ideas?

Thanks in advance

kinger198
  • 577
  • 1
  • 5
  • 15
  • Can you update you question and add fragment where do you set data for ** current_description** variable? – Rinat Jul 14 '16 at 15:33
  • It comes directly from the database but I've added an example above – kinger198 Jul 15 '16 at 07:48
  • use `$dom->loadHTML(''.$current_description); ` instead – Gordon Jul 15 '16 at 07:54
  • While the problem statement in the dupe is not exactly as yours, the root cause is the same: you didnt provide an encoding when loading the html snippet, so DOM will default to iso-8859-1. Adding an XML prolog with the specified encoding will change it to whatever you specified. – Gordon Jul 15 '16 at 08:01
  • @gordon Thanks, but that didn't make a difference. I've run the example above but replaced the loadHTML line with yours and the end result is still the same. I also tried `$dom = new DOMDocument('1.0', 'utf-8');` as in the dup but that was worse – kinger198 Jul 18 '16 at 09:07
  • @steveking198 not sure what you did then. Compare this [verbatim copy of your code](https://eval.in/606867) and [the same with the XML prolog added](https://eval.in/606866). The former has the â. The latter has the dash. – Gordon Jul 18 '16 at 09:11

0 Answers0