saveHTML() doesn't output special characters properly

Question

I've looked at other answers (php: using DomDocument whenever I try to write UTF-8 it writes the hexadecimal notation of it, DOMDocument->saveHTML() converting   to space) and either they don't apply to my situation, or I'm not understanding them.

I'm feeding some HTML into $dom like this...

$dom = new DOMDocument;
$dom->loadHTML($table_data_for_db);

I then do some stuff with it, then output it like this..

$table_data_for_db = $dom->saveHTML();
echo $table_data_for_db;

The problem is that special characters such as → end up like this â†’.

1.) Is there a way around this?
2.) Is there another way in PHP other than using DOMDocument, loadHTML, etc. to strip out sections of HTML? Like, if I want to remove <style id="fraction_class"> and all of its contents, is there another way?

Thank you.

You could use regex... `$table_data_for_db = preg_replace_all('//', '', $table_data_for_db);` — Siguza, Dec 29 '15 at 21:09
@Siguza: Yeah... that's the conclusion I came to. I'll leave this open for now if anyone has any creative ideas, but it seems to be a limitation ATM. — gtilflm, Dec 29 '15 at 21:41
Generally when you have output that looks like `â†’` it means you have an encoding mismatch. `DOMDocument` is the best way to go for manipulating DOM elements, regex is a really bad way to go about it. What is the encoding in PHP, your server encoding, the encoding declared in your markup, and if you are retrieving the output from a database the database encoding and database driver encoding? — Steve Buzonas, Dec 29 '15 at 22:08
@SteveBuzonas: Sorry, but I don't know about any of that, but I got this from my phpinfo file. iconv.input_encoding ISO-8859-1 ISO-8859-1 iconv.internal_encoding ISO-8859-1 ISO-8859-1 iconv.output_encoding ISO-8859-1 ISO-8859-1 Is that what you were asking about? — gtilflm, Dec 29 '15 at 22:34
http://stackoverflow.com/questions/3575109/php-using-domdocument-whenever-i-try-to-write-utf-8-it-writes-the-hexadecimal-n — Iłya Bursov, Dec 29 '15 at 22:37
That may be part of the issue. Assuming you refer to `→` as the literal character it represents and not the string for the html entity you need to use a character set that supports that character, generally one of the UTF variants. The right arrow does not belong to the `ISO-8859-1` character set. — Steve Buzonas, Dec 29 '15 at 22:42
@Lashane that's literally one of the ones referenced in the question asking how it can apply if it does. — Steve Buzonas, Dec 29 '15 at 22:43
@SteveBuzonas: I'm not sure how to do that. Have any advice? — gtilflm, Dec 29 '15 at 22:48
@SteveBuzonas you already mentioned that it could be something with encodings, I'm posting this link to emphasize where OP should look again — Iłya Bursov, Dec 29 '15 at 22:51

saveHTML() doesn't output special characters properly

0 Answers0