I am trying to print all the <p>
elements of a particular HTML document fetched from a URL
. The HTML document is using UTF-8 encoding.
This is my code:
<?php
error_reporting(E_ALL);
ini_set('display_errors', 1);
header('Content-Type: text/plain; charset=utf-8');
header('Access-Control-Allow-Origin: *');
header('Access-Control-Allow-Methods: POST, GET, OPTIONS');
$url = "https://www.sangbadpratidin.in/kolkata/ispat-express-met-an-accident-near-howrah-junction/#.Y7qC6YFeT80.whatsapp";
$user_agent = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36";
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_URL,$url);
$html=curl_exec($ch);
if (!curl_errno($ch)) {
$resultStatus = curl_getinfo($ch, CURLINFO_HTTP_CODE);
if ($resultStatus == 200) {
@$DOM = new DOMDocument;
@$DOM->loadHTML($html);
$bodies = $DOM->getElementsByTagName('p');
foreach($bodies as $body){
$para = $body->nodeValue;
echo $para;
}
}
}
?>
The HTML document is filled with Bengali characters, when I try to print the values, this is what gets printed:
সà§à¦¬à§à¦°à¦¤ বিশà§à¦¬à¦¾à¦¸: ফà§à¦° দà§à¦°à§à¦à¦à¦¨à¦¾à¦° à¦à¦¬à¦²à§ দà§à...
Why am I not getting the original text? Please help me