This is a quite useful question. I think that my solution on Windows 10 PHP 7 is rather useful for people who have yet some UTF-8 conversion trouble.
Here are my steps. The PHP script calling the following function, here in utfsave.php must have UTF-8 encoding itself, and this can be easily done by conversion on UltraEdit.
In the utfsave.php file, we define a function calling PHP fopen($filename, "wb"), i.e., it's opened in both w write mode, and especially with b in binary mode.
<?php
//
// UTF-8 编码:
//
// fnc001: save string as a file in UTF-8:
// The resulting file is UTF-8 only if $strContent is,
// with French accents, Chinese ideograms, etc.
//
function entSaveAsUtf8($strContent, $filename) {
$fp = fopen($filename, "wb");
fwrite($fp, $strContent);
fclose($fp);
return True;
}
//
// 0. write UTF-8 string in fly into UTF-8 file:
//
$strContent = "My string contains UTF-8 chars ie 鱼肉酒菜 for un été en France";
$filename = "utf8text.txt";
entSaveAsUtf8($strContent, $filename);
//
// 2. convert CP936 ANSI/OEM - Chinese simplified GBK file into UTF-8 file
//
// CP936: <https://en.wikipedia.org/wiki/Code_page_936_(Microsoft_Windows)>
// GBK: <https://en.wikipedia.org/wiki/GBK_(character_encoding)>
//
$strContent = file_get_contents("cp936gbktext.txt");
$strContent = mb_convert_encoding($strContent, "UTF-8", "CP936");
$filename = "utf8text2.txt";
entSaveAsUtf8($strContent, $filename);
?>
The content of source file cp936gbktext.txt:
>>Get-Content cp936gbktext.txt
My string contains UTF-8 chars ie 鱼肉酒菜 for un été en France 936 (ANSI/OEM - chinois simplifié GBK)
Running utf8save.php on Windows 10 PHP, thus created utf8text.txt, utf8text2.txt files will be automatically saved in UTF-8 format.
With this method, the BOM characters are not required. The BOM solution is bad because it causes troubles when we do sourcing of an SQL file for MySQL for example.
It's worth noting that I failed making work file_put_contents($filename, utf8_encode($mystring)); for this purpose.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
If you don't know the encoding of the source file, you can list encodings with PHP:
print_r(mb_list_encodings());
This gives a list like this:
Array
(
[0] => pass
[1] => wchar
[2] => byte2be
[3] => byte2le
[4] => byte4be
[5] => byte4le
[6] => BASE64
[7] => UUENCODE
[8] => HTML-ENTITIES
[9] => Quoted-Printable
[10] => 7bit
[11] => 8bit
[12] => UCS-4
[13] => UCS-4BE
[14] => UCS-4LE
[15] => UCS-2
[16] => UCS-2BE
[17] => UCS-2LE
[18] => UTF-32
[19] => UTF-32BE
[20] => UTF-32LE
[21] => UTF-16
[22] => UTF-16BE
[23] => UTF-16LE
[24] => UTF-8
[25] => UTF-7
[26] => UTF7-IMAP
[27] => ASCII
[28] => EUC-JP
[29] => SJIS
[30] => eucJP-win
[31] => EUC-JP-2004
[32] => SJIS-win
[33] => SJIS-Mobile#DOCOMO
[34] => SJIS-Mobile#KDDI
[35] => SJIS-Mobile#SOFTBANK
[36] => SJIS-mac
[37] => SJIS-2004
[38] => UTF-8-Mobile#DOCOMO
[39] => UTF-8-Mobile#KDDI-A
[40] => UTF-8-Mobile#KDDI-B
[41] => UTF-8-Mobile#SOFTBANK
[42] => CP932
[43] => CP51932
[44] => JIS
[45] => ISO-2022-JP
[46] => ISO-2022-JP-MS
[47] => GB18030
[48] => Windows-1252
[49] => Windows-1254
[50] => ISO-8859-1
[51] => ISO-8859-2
[52] => ISO-8859-3
[53] => ISO-8859-4
[54] => ISO-8859-5
[55] => ISO-8859-6
[56] => ISO-8859-7
[57] => ISO-8859-8
[58] => ISO-8859-9
[59] => ISO-8859-10
[60] => ISO-8859-13
[61] => ISO-8859-14
[62] => ISO-8859-15
[63] => ISO-8859-16
[64] => EUC-CN
[65] => CP936
[66] => HZ
[67] => EUC-TW
[68] => BIG-5
[69] => CP950
[70] => EUC-KR
[71] => UHC
[72] => ISO-2022-KR
[73] => Windows-1251
[74] => CP866
[75] => KOI8-R
[76] => KOI8-U
[77] => ArmSCII-8
[78] => CP850
[79] => JIS-ms
[80] => ISO-2022-JP-2004
[81] => ISO-2022-JP-MOBILE#KDDI
[82] => CP50220
[83] => CP50220raw
[84] => CP50221
[85] => CP50222
)
If you cannot guess, you try one by one, as mb_detect_encoding() cannot do the job easily.