The solution
I spent several hours and I found the correct encodings. strftime()
is not delivering an UTF-8
String. For details have a look at the bottom of this answer. I ended up with a formatTime()
function which is delivering me the correct time in the correct encoding (UTF-8
for me).
function formatTime($format, $language = null, $timestamp = null){
switch($language){
case 'chinese':
$locale = setlocale(LC_TIME, 'zh_CN.utf8', 'zh_CN.UTF-8', 'zh_CN', 'zh');
break;
case 'hungarian':
$locale = setlocale(LC_TIME, 'hu_HU.utf8', 'hu_HU.UTF-8', 'hu_HU', 'hr');
break;
case 'russian':
$locale = setlocale(LC_TIME, 'ru_RU.utf8', 'ru_RU.UTF-8', 'ru_RU', 'ru');
break;
case 'german':
$locale = setlocale(LC_TIME, 'de_DE.utf8', 'de_DE.UTF-8', 'de_DE', 'de');
break;
case 'french':
$locale = setlocale(LC_TIME, 'fr_FR.utf8', 'fr_FR.UTF-8', 'fr_FR', 'fr');
break;
case 'polish':
$locale = setlocale(LC_TIME, 'pl_PL.utf8', 'pl_PL.UTF-8', 'pl_PL', 'pl');
break;
case 'turkish':
$locale = setlocale(LC_TIME, 'tr_TR.utf8', 'tr_TR.UTF-8', 'tr_TR', 'tr');
break;
case 'english':
$locale = setlocale(LC_TIME, 'en_GB.utf8', 'en_GB.UTF-8', 'en_GB', 'en');
break;
// ...
default: break;
}
if(!is_numeric($timestamp)){
$datetime = strftime($format);
}
else{
$datetime = strftime($format, $timestamp);
}
$current_locale = strtolower(setlocale(LC_TIME, 0));
if(($pos = strpos("utf", $current_locale)) === false || strpos("8", $current_locale, $pos) === false){
// UTF-8 locale is not used, the encodings are found out with the code shown below
$locale_default_encodings = array(
"german" => "ISO-8859-1",
"french" => "ISO-8859-1",
"polish" => "ISO-8859-2",
"turkish" => "ISO-8859-9",
// Testing hungarian results in "Windows-1252", but php.net recommends to
// use ISO-8859-2, in fact Windows-1252 is based on ISO-8859-2 so it should
// (hopefully) work with both (*)
"hungarian" => "ISO-8859-2",
"chinese" => "CP936",
"russian" => "KOI8-R"
);
$target_encoding = mb_internal_encoding(); // or "UTF-8" or whatever
if(isset($locale_default_encodings[$language])){
$datetime = mb_convert_encoding(
$datetime,
$target_encoding,
$locale_default_encodings[$language]
);
}
else{
// try to avoid this case
$datetime = mb_convert_encoding($datetime, $target_encoding);
}
}
setlocale(LC_TIME, $locale);
return $datetime;
}
(*): http://php.net/manual/de/function.strftime.php#94399
The long long way
I checked out the strftime("%B")
result for the specific language. This is the full month name. I checked the translation for my languages, then I looked up the hex values for UTF-8
for the different letters of the translation.
Now I am iterating through all the encodings that are supported by php. I convert the result given by strftime()
from the current iterated encoding to UTF-8
. Now I can compare result of strftime()
converted to UTF-8
to the hex values of the manual translations which are also the hex values for UTF-8
. If they match the result of strftime()
has the encoding of the current interated encoding.
I choose the hex values because they defenetly are the same and do not depend on the internal encoding because they are ASCII Strings (or even numbers in php).
This gives me the following output, the code is posted below:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
</head>
<body>
<h1>Detecting the font encoding of <code>strftime()</code>
</h1>
<h2>hungarian</h2>
<p>
<code>strftime()</code> for March for language hungarian. Expected hex: <code>6fc5be756a616b</code>, converted expected hex to string: <code>ožujak</code>
</p>
<table>
<tr>
<td>initial return value</td>
<td>oߵjak</td>
<td>6f9e756a616b</td>
</tr>
<tr>
<td colspan='3'>Encodings that deliver the correct result:</td>
</tr>
<tr style='background: green;'>
<td>Windows-1252</td>
<td>ožujak</td>
<td>6fc5be756a616b</td>
</tr>
</table>
<h2>chinese</h2>
<p>
<code>strftime()</code> for December for language chinese. Expected hex: <code>e58d81e4ba8ce69c88</code>, converted expected hex to string: <code>十二月</code>
</p>
<table>
<tr>
<td>initial return value</td>
<td>ʮՂ</td>
<td>caaeb6fed4c2</td>
</tr>
<tr>
<td colspan='3'>Encodings that deliver the correct result:</td>
</tr>
<tr style='background: green;'>
<td>EUC-CN</td>
<td>十二月</td>
<td>e58d81e4ba8ce69c88</td>
</tr>
<tr style='background: green;'>
<td>CP936</td>
<td>十二月</td>
<td>e58d81e4ba8ce69c88</td>
</tr>
<tr style='background: green;'>
<td>GB18030</td>
<td>十二月</td>
<td>e58d81e4ba8ce69c88</td>
</tr>
</table>
<h2>russian</h2>
<p>
<code>strftime()</code> for December for language russian. Expected hex: <code>d0b4d095d099d0aed090d09fd0ad</code>, converted expected hex to string: <code>дЕЙЮАПЭ</code>
</p>
<table>
<tr>
<td>initial return value</td>
<td>ť롡td>
<td>c4e5eae0e1f0fc</td>
</tr>
<tr>
<td colspan='3'>Encodings that deliver the correct result:</td>
</tr>
<tr style='background: green;'>
<td>KOI8-R</td>
<td>дЕЙЮАПЭ</td>
<td>d0b4d095d099d0aed090d09fd0ad</td>
</tr>
<tr style='background: green;'>
<td>KOI8-U</td>
<td>дЕЙЮАПЭ</td>
<td>d0b4d095d099d0aed090d09fd0ad</td>
</tr>
</table>
</body>
</html>
Note that this html is encoded in UTF-8. Still the result given by the strftime()
function is wrong! This has nothing to do with the browser or editor encoding as pointed out in the comments.
$encodings = array(
"UCS-4",
"UCS-4BE",
"UCS-4LE",
"UCS-2",
"UCS-2BE",
"UCS-2LE",
"UTF-32",
"UTF-32BE",
"UTF-32LE",
"UTF-16",
"UTF-16BE",
"UTF-16LE",
"UTF-7",
"UTF7-IMAP",
"UTF-8",
"ASCII",
"EUC-JP",
"SJIS",
"eucJP-win",
"SJIS-win",
"ISO-2022-JP",
"ISO-2022-JP-MS",
"CP932",
"CP51932",
"SJIS-mac",
"SJIS-Mobile#DOCOMO",
"SJIS-Mobile#KDDI",
"SJIS-Mobile#SOFTBANK",
"UTF-8-Mobile#DOCOMO",
"UTF-8-Mobile#KDDI-A",
"UTF-8-Mobile#KDDI-B",
"UTF-8-Mobile#SOFTBANK",
"ISO-2022-JP-MOBILE#KDDI",
"JIS",
"JIS-ms",
"CP50220",
"CP50220raw",
"CP50221",
"CP50222",
"ISO-8859-1",
"ISO-8859-2",
"ISO-8859-3",
"ISO-8859-4",
"ISO-8859-5",
"ISO-8859-6",
"ISO-8859-7",
"ISO-8859-8",
"ISO-8859-9",
"ISO-8859-10",
"ISO-8859-13",
"ISO-8859-14",
"ISO-8859-15",
"ISO-8859-16",
"byte2be",
"byte2le",
"byte4be",
"byte4le",
"BASE64",
"HTML-ENTITIES",
"7bit",
"8bit",
"EUC-CN",
"CP936",
"GB18030",
"HZ",
"EUC-TW",
"CP950",
"BIG-5",
"EUC-KR",
"UHC",
"ISO-2022-KR",
"Windows-1251",
"Windows-1252",
"CP866",
"KOI8-R",
"KOI8-U",
"ArmSCII-8"
);
$show_wrong_encodings = false;
$internal_encoding = "UTF-8";
mb_internal_encoding($internal_encoding);
$languages = array(
// name of the language => hex in UTF-8 and timestamp to check
"german" => array("4dc3a4727a", 1520343439), // march
"french" => array("64c3a963656d627265", 1544103703), // december
"polish" => array("677275647a6965c584", 1544103703), // december
"turkish" => array("4172616cc4b16b", 1544103703), // december
"hungarian" => array("6fc5be756a616b", 1520343439), // march
"chinese" => array("e58d81e4ba8ce69c88", 1544103703), // december
"russian" => array("d0b4d095d099d0aed090d09fd0ad", 1544103703) // december
);
$format = "%B"; // print full month name
print("<h1>Detecting the font encoding of <code>strftime()</code></h1>\n");
foreach($languages as $language => $data){
// the hex value in UTF-8, this is the target value
$hex = $data[0];
// the timestamp to check
$timestamp = $data[1];
print(
"<h2>".$language."</h2>\n".
"<p>".
"<code>strftime()</code> for ".formatTime("%B", "english", $timestamp)." ".
"for language ".$language.". Expected hex: <code>".$hex."</code>, converted expected ".
"hex to string: <code>".tostring($hex)."</code>".
"</p>\n"
);
// this is a different formatTime() function than mentioned above, it is defined after this
// foreach
$string = formatTime("%B", $language, $timestamp);
print("<table>\n");
print("<tr>\n".
"\t<td>initial return value</td>\n".
"\t<td>".$string."</td>\n".
"\t<td>".tohex($string)."</td>\n".
"</tr>\n\n".
"<tr><td colspan='3'>Encodings that deliver the correct result:</td></tr>"
);
foreach($encodings as $source_encoding){
$converted = mb_convert_encoding($string, $internal_encoding, $source_encoding);
$converted_hex = tohex($converted);
$style = "";
if($converted_hex == $hex){
$style = "background: green";
}
elseif(!$show_wrong_encodings){
$style = "display: none";
}
print("<tr style='".$style.";'>\n".
"\t<td>".$source_encoding."</td>\n".
"\t<td>".$converted."</td>\n".
"\t<td>".$converted_hex."</td>\n".
"</tr>\n"
);
}
print("</table>");
}
function tohex($string){
return implode(unpack("H*", $string));
}
function tostring($hex){
return pack("H*", $hex);
}
function formatTime($format, $language, $timestamp){
switch($language){
case 'chinese':
$locale = setlocale(LC_TIME, 'zh_CN.utf8', 'zh_CN.UTF-8', 'zh_CN', 'zh');
break;
case 'hungarian':
$locale = setlocale(LC_TIME, 'hu_HU.utf8', 'hu_HU.UTF-8', 'hu_HU', 'hr');
break;
case 'russian':
$locale = setlocale(LC_TIME, 'ru_RU.utf8', 'ru_RU.UTF-8', 'ru_RU', 'ru');
break;
case 'german':
$locale = setlocale(LC_TIME, 'de_DE.utf8', 'de_DE.UTF-8', 'de_DE', 'de');
break;
case 'french':
$locale = setlocale(LC_TIME, 'fr_FR.utf8', 'fr_FR.UTF-8', 'fr_FR', 'fr');
break;
case 'polish':
$locale = setlocale(LC_TIME, 'pl_PL.utf8', 'pl_PL.UTF-8', 'pl_PL', 'pl');
break;
case 'turkish':
$locale = setlocale(LC_TIME, 'tr_TR.utf8', 'tr_TR.UTF-8', 'tr_TR', 'tr');
break;
// ...
default:
$locale = setlocale(LC_TIME, 'en_GB.utf8', 'en_GB.UTF-8', 'en_GB', 'en');
break;
}
$datetime = strftime($format, $timestamp);
setlocale(LC_TIME, $locale);
return $datetime;
}