0

I've looked at other answers (php: using DomDocument whenever I try to write UTF-8 it writes the hexadecimal notation of it, DOMDocument->saveHTML() converting   to space) and either they don't apply to my situation, or I'm not understanding them.

I'm feeding some HTML into $dom like this...

$dom = new DOMDocument;
$dom->loadHTML($table_data_for_db);

I then do some stuff with it, then output it like this..

$table_data_for_db = $dom->saveHTML();
echo $table_data_for_db;

The problem is that special characters such as → end up like this →.

1.) Is there a way around this?
2.) Is there another way in PHP other than using DOMDocument, loadHTML, etc. to strip out sections of HTML? Like, if I want to remove <style id="fraction_class"> and all of its contents, is there another way?

Thank you.

Community
  • 1
  • 1
gtilflm
  • 1,389
  • 1
  • 21
  • 51
  • You could use regex... `$table_data_for_db = preg_replace_all('//', '', $table_data_for_db);` – Siguza Dec 29 '15 at 21:09
  • @Siguza: Yeah... that's the conclusion I came to. I'll leave this open for now if anyone has any creative ideas, but it seems to be a limitation ATM. – gtilflm Dec 29 '15 at 21:41
  • 2
    Generally when you have output that looks like `→` it means you have an encoding mismatch. `DOMDocument` is the best way to go for manipulating DOM elements, regex is a really bad way to go about it. What is the encoding in PHP, your server encoding, the encoding declared in your markup, and if you are retrieving the output from a database the database encoding and database driver encoding? – Steve Buzonas Dec 29 '15 at 22:08
  • @SteveBuzonas: Sorry, but I don't know about any of that, but I got this from my phpinfo file. iconv.input_encoding ISO-8859-1 ISO-8859-1 iconv.internal_encoding ISO-8859-1 ISO-8859-1 iconv.output_encoding ISO-8859-1 ISO-8859-1 Is that what you were asking about? – gtilflm Dec 29 '15 at 22:34
  • http://stackoverflow.com/questions/3575109/php-using-domdocument-whenever-i-try-to-write-utf-8-it-writes-the-hexadecimal-n – Iłya Bursov Dec 29 '15 at 22:37
  • That may be part of the issue. Assuming you refer to `→` as the literal character it represents and not the string for the html entity you need to use a character set that supports that character, generally one of the UTF variants. The right arrow does not belong to the `ISO-8859-1` character set. – Steve Buzonas Dec 29 '15 at 22:42
  • 1
    @Lashane that's literally one of the ones referenced in the question asking how it can apply if it does. – Steve Buzonas Dec 29 '15 at 22:43
  • @SteveBuzonas: I'm not sure how to do that. Have any advice? – gtilflm Dec 29 '15 at 22:48
  • @SteveBuzonas you already mentioned that it could be something with encodings, I'm posting this link to emphasize where OP should look again – Iłya Bursov Dec 29 '15 at 22:51
  • 1
    @gtilflm https://stackoverflow.com/q/1344692 – Siguza Dec 29 '15 at 22:52

0 Answers0