I am trying to parse the Guardian RSS feed (Link). The feed contains curved quotes (” ’ “ ‘), dash (-) and characters with accents (Orbán).
When I parse & display the text on a HTML page, these characters show as â (for quotes & dash), á (for á) & so on in the 'description' section. How do I make them parse properly?
Code
$xml = simplexml_load_file($link);
for($i = 0; $i < 30; $i++){
$title = $xml->channel->item[$i]->title;
$description = $xml->channel->item[$i]->description;
$count = 0;
$para = "";
$doc = new DOMDocument();
@$doc->loadHTML($description);
while($count<3){
if($count==0){
$para = $doc->getElementsByTagName('p')->item($count)->nodeValue;
}else{
$para = $para."<br><br>".$doc->getElementsByTagName('p')->item($count)->nodeValue;
}
$count++;
}
echo "<tr>";
echo "<td>" . $title . "</td>";
echo "<td>" . $para . "</td>";
echo "</tr>";
}
I have the below line in my 'head' section.
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
The title section shows properly. It might be because they use straight quotes (') in title & curved (‘) in description. But as you can see á is also showing correctly in title.