First you have to identify the source website character encoding.
Choose a page and download it... using the terminal, type:
$ curl -D headers.txt -o page.html http:/www.example.com/index.html
The response headers are saved into headers.txt
while the page source html is stored into page.html
Inspect the two files with a text editor and search for Content-Type
you should find indication of the character encoding at least in one of them.
If you're not successfull you can use file
to try to "guess" the character encoding by inspecting the file contents:
$ file -I page.html
The output looks like this:
page.html: text/plain; charset=iso-8859-1
Second you have to decide or understand what the destination character set is:
are you storing the web page into a text file? What is the expected character encoding of the file?
are you parsing the web page within PHP in order to fetch some data of your interest?
are you serving back the webpage (totally or partially) on your website? What is the character encoding of the website?
Let's assume (for example) you want to end up with Unicode characters encoded as UTF-8.
Finally improve your PHP script to make the proper charset conversion after the page is retrieved with $page = curl_exec($curl);
You may use mb-convert_encoding
$page = mb_convert_encoding( $page, 'ISO-8859-1', 'UTF-8' );
// from ----------^ ^--------to
Alternatively iconv
can be used for the same purpose.