0

How do I prevent DOMDocument from changing character encoding? See the following, and note how is being changed to â.

<?php
    $message = "<p>Hello “something in quotes” goodby</p>";
    echo("pre message: $message\n");
    $doc = new DOMDocument();
    $doc->loadHTML($message);
    $body = $doc->getElementsByTagName('body')->item(0);
    $message=$doc->saveHTML($body);
    echo("Modified message: $message\n");

OUTPUT:

pre message: <p>Hello “something in quotes” goodby</p>
Modified message: <body><p>Hello âsomething in quotesâ goodby</p></body>
user1032531
  • 24,767
  • 68
  • 217
  • 387

1 Answers1

0

I've run into similar problems and solved them using iconv and forcing the encoding.

$new_string = iconv("UTF-8", "UTF-8//TRANSLIT",$old_string);

here's the php man page on it.

after further investigation it looks like this is a bug in DOM:Document.

https://bugs.php.net/bug.php?id=32547

jbrahy
  • 4,228
  • 1
  • 42
  • 54