Given a DOMDocument
constructed with a stylesheet that contains an emoji character like so:
$dom = new DOMDocument();
$dom->loadHTML( "<!DOCTYPE html><html><head><meta charset=utf-8><style>span::before{ content: \"⚡️\"; }</style></head><body><span></span></body></html>" );
I've found some strange behavior when serializing the DOM back out to HTML.
If I do $dom->saveHTML( $dom->documentElement )
then I get (as desired):
<html><head><meta charset="utf-8">
<style>span::before{ content: "⚡️"; }</style>
</head><body><span></span></body></html>
However, if I instead do $dom->saveHTML()
to save the entire document I get (erroneously):
<html><head><meta charset="utf-8">
<style>span::before{ content: "⚡️"; }</style>
</head><body><span></span></body></html>
Notice how the emoji “⚡️” is encoded as the HTML entities ⚡️
inside of the stylesheet, and browsers do not like this and it is treated as a literal string since CSS escape \26A1
should be used in instead.
I tried setting $dom->substituteEntities = false
but without any effect.
The same HTML entity conversion is also happening inside of script
tags, which causes similar problems in browsers.
Test via online PHP shell: https://3v4l.org/jMfDd