Ok, I am really strugggeling with this one for a while . I have thousands files with wrong characters that were wrongly extracted by the server from a zip file , producing names converted by the server in this manner :
The original file name ( example ) is
QQ图片20160314173435.jpg
the files now presented on the server took the shape of
QQ#U56fe#U724720160314173435.jpg
where
图 = #U56fe
and
片= #U7247
All the files have the same 2 characters with diffeerent numbering only..
I have tried any function I can think of , including iconv
family, mb_
family , str_raplace
and even htmlentities_de/encode
etc .. etc .
Each would either not work, or would produce other strange characters .
my code as for now is :
// iconv_set_encoding('input_encoding','GB18030');
// print_r($enc);
if ($handle = opendir('./')) {
while (false !== ($fileName = readdir($handle))) {
$ext = pathinfo($fileName, PATHINFO_EXTENSION);
echo $ext .PHP_EOL;
if ( $ext == 'jpg' ){
echo "========" . mb_detect_encoding($fileName).PHP_EOL . "\r\n";
$newName = mb_convert_encoding($fileName, "UTF-8",mb_detect_encoding($fileName));
// $newName = str_replace("#","\\",$fileName);
// $newName = str_replace("#U56fe",iconv("UTF-8","GB2312","图"),$newName);
// $newName = html_entity_decode($newName,ENT_NOQUOTES,"GB2312");
// $newName = urlencode($newName);
// $newName = urldecode($newName);
//
// Tried //GB2312 // GB18030
// $newName = iconv(mb_detect_encoding($newName, mb_detect_order(), true), "GB18030", $newName);
// echo $newName .PHP_EOL;
// $newName = iconv("UTF-8", "GB18030", $fileName);
// $newName = iconv("GB18030", "UTF-8", $fileName);
// $newName = iconv("ISO-8859-9//TRANSLIT", "UTF-8", $fileName);
// echo $newName .PHP_EOL;
// $newName = mb_convert_encoding($fileName, 'UTF-8', 'HTML-ENTITIES');
// tried both copy and rename+unlink
//rename($fileName, $newName);
copy ($fileName,$newName);
}
}
closedir($handle);
}
I left some of the failed attempts just to show what was already tried , but actually I tried even more ( including iconv_set_encoding
at the beginning ).
I have tried the script both on local ( win7 / xampp ) and on live server ( centos / Cpanel ) .
After so many failures I am not even sure whether the names are ASCII
, UTF-8
or some unicode
substitution represented in UTF-8
.
Not that the problem is not with creating new files or folders - that I can do without a problem. Thee problem is renaming existing files with PHP
only . Any other method of renaming actually works .
The strange thing is that I have tested the same script on another local machine ( UBUNTU ) - which was working well -is of course suggests that is somehow OS / PHP settings that are responsible - but how ?
And also - there must be some way to tell a script how to use codepages / encoding and dynamically change that ..