I am using the following function to get the inner html of html string
function DOMinnerHTML($element)
{
$innerHTML = "";
$children = $element->childNodes;
foreach ($children as $child)
{
$tmp_dom = new DOMDocument('1.0', 'UTF-8');
$tmp_dom->appendChild($tmp_dom->importNode($child, true));
$innerHTML .= trim($tmp_dom->saveHTML());
}
return $innerHTML;
}
my html string also contains unicode character. here is example of html string
$html = '<div>Thats True. Yes it is well defined آپ مجھے تم کہہ کر پکاریں</div>';
When I use the above function
$output = DOMinnerHTML($html);
the output is as below
$output = '<div>Thats True. Yes it is well defined
کے۔سلطا</div>';
the actual unicode characters converted to numeric values.
I have debugged the code and found that in DOMinnerHTML function before the following line
$innerHTML .= trim($tmp_dom->saveHTML());
if I echo
echo $tmp_dom->textContent;
It shows the actual unicode characters but after saving to $innerHTML
it outputs the numeric symbols.
Why it is doing that.
Note: please don't suggest me html_entity_decode like functions to convert numeric symbols to real unicode characters because, I also have user formatted data in my html string, that I don't want to convert.
Note: I have also tried by putting the
<meta http-equiv="content-type" content="text/html; charset=utf-8">
before my html string but no difference.