0

I am trying to create some directories with unicode names in Windows. The names displays correctly in the Browser but when the Directory is created then it is converted into garbage text.

I have tried ecoding conversions removing special characters.

$myfile = fopen("unicode.csv", "r") or die("Unable to open file!");
$lines = file("unicode.csv", FILE_IGNORE_NEW_LINES);
echo '<table border="1">';
foreach($lines as $k=>$v){
    $parts = preg_split('/[\t]/', $v);
    echo '<tr>';
    foreach($parts as $key=>$val){
        if($key==0){
            $dir = str_replace("/", "", $val);
            $dir = str_replace("\\", "", $dir);
            $encode = mb_detect_encoding($dir, mb_detect_order(), false);
            $dir = mb_convert_encoding($dir , 'UTF-8' , 'UTF-8');
            echo '<td>'.$dir.'</td><td>'.$encode.'</td>';
            $result = mkdir ($dir, "0777");
        }
        echo '<td>'.$val.'</td>';
    }
    echo '</tr>';
}

Expected result is directory name should be readable in UTF-8. Expected Results (HTML Output of the Script

It turns out to be in garbage text.Gargabe Directory Names

  • is window supporting hindi ? – developerCK May 14 '19 at 07:51
  • @developerCK yes it is. I can create folder in Hind from WIndows Explorer. – Gaurav Saxena May 14 '19 at 07:52
  • I think on an NTFS formatted drive, you should use UTF-16 instead of UTF-8, at least according to https://stackoverflow.com/questions/2050973/what-encoding-are-filenames-in-ntfs-stored-as – 04FS May 14 '19 at 08:02
  • @04FS still not working. Warning: mkdir() expects parameter 1 to be a valid path, string given Converted Text : �� 7 M * M 0 > . 0 @ B ( ? / 0 9 > 8 M B 2 > 5 ( @ * 0 ? 7 M 7 & + H > , > & 7 M – Gaurav Saxena May 14 '19 at 08:07
  • I can give my system for remote access. If you want to experiment. – Gaurav Saxena May 14 '19 at 08:08
  • 1
    Based on your results, it looks like PHP `mkdir` does not transcode from UTF-8 to native Windows UTF-16LE in order to call [W]ide-character `CreateDirectoryW`. It probably just calls C `mkdir`. This naively passes bytes to `CreateDirectoryA`, which decodes the UTF-8 name using the system [A]NSI encoding (e.g. codepage 1252). Starting with Windows 10, we can set [A]NSI to UTF-8 in the system locale configuration. This change requires a reboot. – Eryk Sun May 14 '19 at 10:19
  • @eryksun Thanks a ton. Your suggestion worked like magic. I changed the system Locale configuration and it is working fine. – Gaurav Saxena May 14 '19 at 12:21

1 Answers1

0

Thanks to @eryksun :

Based on your results, it looks like PHP mkdir does not transcode from UTF-8 to native Windows UTF-16LE in order to call [W]ide-character CreateDirectoryW. It probably just calls C mkdir. This naively passes bytes to CreateDirectoryA, which decodes the UTF-8 name using the system [A]NSI encoding (e.g. codepage 1252). Starting with Windows 10, we can set [A]NSI to UTF-8 in the system locale configuration. This change requires a reboot.