PHP DOM parsing replaces quotation marks with question marks

Question

I have a script to parse for different div's and to remove them from my string. I use utf-8 encoding because there are german special characters in the strings. It works perfectly but there are always faults with quotation marks. Because they are replaced by question marks. For example: „exmaple“ becomes ?example?

Here is my code:

 $doc = new DOMDocument;
            $doc->preserveWhiteSpace = false;
            $doc->encoding = 'utf-8';
            $doc->loadhtml(utf8_decode($content));

            $xpath = new DOMXPath($doc);

            $ns = $xpath->query('//div[@id="amazon-polly-label-tab"]|//div[@id="amazon-polly-play-tab"]|//div[@id="amazon-polly-by-tab"]');
            // there can be only one... but anyway
            foreach($ns as $node) {
                $node->parentNode->removeChild($node);
            }
            echo $doc->savehtml();

Do you know what I have to change?

Possible duplicate of [PHP DOMDocument loadHTML not encoding UTF-8 correctly](https://stackoverflow.com/questions/8218230/php-domdocument-loadhtml-not-encoding-utf-8-correctly) — Mihai Matei, Jun 26 '18 at 09:36
[`utf8_decode`](http://php.net/manual/de/function.utf8-decode.php) converts utf-8 to ISO-8859-1. There should be no reason to call it nowadays, where everything is utf-8. — Karsten Koop, Jun 26 '18 at 10:07
Okay thank you, I removed $doc->encoding = 'utf-8'; and replaced $doc->loadhtml(utf8_decode($content)); with $doc->loadhtml(mb_convert_encoding($content, 'HTML-ENTITIES', 'UTF-8')); now it works! — till36, Jun 26 '18 at 10:47

PHP DOM parsing replaces quotation marks with question marks

0 Answers0