1

    if ($_GET["link"]!=""){
$curl = curl_init('http://exaple.com'.$link);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($curl,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');

$page = curl_exec($curl);
echo $page;
}

Hi, the website is in other language. the characters are getting encoded. i am getting "??" and strange texts instead of character "á" "i" "á" etc (Unicode). Is there any way to make it work ?

1 Answers1

0

First you have to identify the source website character encoding.

Choose a page and download it... using the terminal, type:

$ curl -D headers.txt -o page.html http:/www.example.com/index.html

The response headers are saved into headers.txt while the page source html is stored into page.html

Inspect the two files with a text editor and search for Content-Type you should find indication of the character encoding at least in one of them.

If you're not successfull you can use file to try to "guess" the character encoding by inspecting the file contents:

$ file -I page.html

The output looks like this:

page.html: text/plain; charset=iso-8859-1

Second you have to decide or understand what the destination character set is:

  • are you storing the web page into a text file? What is the expected character encoding of the file?

  • are you parsing the web page within PHP in order to fetch some data of your interest?

  • are you serving back the webpage (totally or partially) on your website? What is the character encoding of the website?

Let's assume (for example) you want to end up with Unicode characters encoded as UTF-8.


Finally improve your PHP script to make the proper charset conversion after the page is retrieved with $page = curl_exec($curl);

You may use mb-convert_encoding

$page = mb_convert_encoding( $page, 'ISO-8859-1', 'UTF-8' );
//                      from ----------^            ^--------to

Alternatively iconv can be used for the same purpose.

Paolo
  • 15,233
  • 27
  • 70
  • 91