So I find my self in the need of converting some html text that comes from a database, and I get string similar to these:
<p style=\"font-size: 10px;\">\n<strong>Search for:<\/strong> <span style=\"color:#888888;\">2 to 15 People, \u00b120$ Per Person, Informal, Available on Date<\/span>\n<\/p>
And I need to put this in proper HTML. Something like this:
<p style="font-size: 10px;">
<strong>Search for:</strong> <span style="color:#888888;">2 to 15 People, ±20$ Per Person, Informal, Available on Date</span>
</p>
There are several issues here, first the slashes, i'm using stripcslashes before stripslashes so it first converts the C-style escapes like "\n". Then I use stripslashes to remove the quote escapes. But this messes up the unicode characters like the ± sign (\u00b1)
I've searched online and it seems using json decode is a trick usually used for this but I can't use json decode here because of the type of string I'm working with. This is just an example, the real strings I'm working with are full HTML pages.
Does anyone have any hints how I can tackle this?
This is what I'm currently using: Right now I'm using this:
$final = urlencode(stripslashes(stripcslashes(html_entity_decode($html, ENT_COMPAT, 'UTF-8'))));
It gets me an almost perfect HTML page, except for the unicode characters like \u00b1
SOLUTION
I ended up using the solution given by Lawrence Cherone
$new_html = str_replace(array('\"', '\/', '"', '\n'), array('"', '/', '\'', "\n"), $old_html);
function unicode_convert($match){
return mb_convert_encoding(pack('H*', $match[1]), 'UTF-8', 'UCS-2BE'); }
$new_html = preg_replace_callback('/\\\\u([0-9a-fA-F]{4})/', "unicode_convert", $new_html);