I am using PHP Simple HTML DOM Parser http://simplehtmldom.sourceforge.net/
to fetch data like Page Title, Meta Description and Meta Tags from other domains and then insert it into database.
But I have some issues with encoding. The problem is that I do not get correct characters from those website which is not in English Language.
Below is the code:
<?php
require 'init.php';
$curl = new curl();
$html = new simple_html_dom();
$page = $_GET['page'];
$curl_output = $curl->getPage($page);
$html->load($curl_output['content']);
$meta_title = $html->find('title', 0)->innertext;
print $meta_title . "<hr />";
// print $html->plaintext . "<hr />";
?>
Output for facebook.com
page
Welcome to Facebook — Log in, sign up or learn more
Output for amazon.cn
page
亚马逊-网上è´ç‰©å•†åŸŽï¼šè¦ç½‘è´, å°±æ¥Z.cn!
Output for mail.ru
page
Mail.Ru: почта, поиÑк в интернете, новоÑти, игры, развлечениÑ
So, the characters is not being encoded properly.
Can anyone help me how to solve this issue so that I can add correct data into my database.