0

I have a script to parse for different div's and to remove them from my string. I use utf-8 encoding because there are german special characters in the strings. It works perfectly but there are always faults with quotation marks. Because they are replaced by question marks. For example: „exmaple“ becomes ?example?

Here is my code:

 $doc = new DOMDocument;
            $doc->preserveWhiteSpace = false;
            $doc->encoding = 'utf-8';
            $doc->loadhtml(utf8_decode($content));

            $xpath = new DOMXPath($doc);

            $ns = $xpath->query('//div[@id="amazon-polly-label-tab"]|//div[@id="amazon-polly-play-tab"]|//div[@id="amazon-polly-by-tab"]');
            // there can be only one... but anyway
            foreach($ns as $node) {
                $node->parentNode->removeChild($node);
            }
            echo $doc->savehtml();

Do you know what I have to change?

till36
  • 75
  • 1
  • 8
  • Possible duplicate of [PHP DOMDocument loadHTML not encoding UTF-8 correctly](https://stackoverflow.com/questions/8218230/php-domdocument-loadhtml-not-encoding-utf-8-correctly) – Mihai Matei Jun 26 '18 at 09:36
  • [`utf8_decode`](http://php.net/manual/de/function.utf8-decode.php) converts utf-8 to ISO-8859-1. There should be no reason to call it nowadays, where everything is utf-8. – Karsten Koop Jun 26 '18 at 10:07
  • Okay thank you, I removed $doc->encoding = 'utf-8'; and replaced $doc->loadhtml(utf8_decode($content)); with $doc->loadhtml(mb_convert_encoding($content, 'HTML-ENTITIES', 'UTF-8')); now it works! – till36 Jun 26 '18 at 10:47

0 Answers0