14

I use PHP ZipArchive class to extract a .zip file, It works fine for English, but cause problems in my local language (THAI).

I use icov('utf-8','windows-874',$zip->getNameIndex($i)) to convert utf-8 to THAI. It works for folder's/file's name, but doesn't work for extracted .zip file and cause this error :

iconv(): Detected an illegal character in input string

Can anyone please tell me what the problem is here?

My PHP code

$file = iconv('utf-8', 'windows-874', $_GET['File']);
$path = iconv('utf-8', 'windows-874', $_GET['Path']);

$zip = new ZipArchive;
if ($zip->open($file) === TRUE) {
    // convert to Thai language
    for($i = 0; $i < $zip->numFiles; $i++) {
        $name = $zip->getNameIndex($i);
        //echo iconv("charset zip file", "windows-874", $name);
        //$zip->extractTo($path,$name); -> this problem
    }
    $zip->close();
    echo json_encode('unZip!!!');
} else {
    echo json_encode('Failed');
}

After I extract the zipped file, The file's name is not the one I set for it. After I extract the zipped file, The file's name is not the one I set for it.

This is name i try to set : This is name i try to set :

Here is my zipped file :

https://www.dropbox.com/s/9f4j04lkvsyuy63/test.zip?dl=0

UPDATE
I tried unzipping the file in windows XP, it works fine there but not in windows 7.

Muhammad Hassaan
  • 7,296
  • 6
  • 30
  • 50
  • Nobody can tell unless you show what the string actually contains. (ZIP files aren't actually renowned for Unicode support. Most tools just use local charsets asis.) – mario Jul 05 '15 at 15:02
  • 1
    Where is the code you use for actually extracting the content? Could you upload a minimal example of a zip file that gives the error somewhere (after adding the code you use to extract its contents)? – MatsLindh Jul 05 '15 at 16:09
  • 2
    but it is not better to work the "THAI" with `utf8` anyway? What is the need to convert it to `windows-874`? – Protomen Jul 05 '15 at 16:15
  • My directory is thai language and I use $zip->open($file) is Fail but I convert to windows-874 is work for me – Veerapat Boonvanich Jul 05 '15 at 16:20
  • You have not provided enough information, most likely this is not a direct problem with the language, it sounds like your file is not a valid zip, this happens all the time in php because of an extra bit in the zip binary, and things like that. Are you downloading via web from a server? Run the following command on linux: zip -T filename.zip and tell us the output. – Neo Jul 06 '15 at 20:55
  • I stumbled onto a project that fixes just many encoding issues regarding encoded names of files. I think [this](https://github.com/julp/ImprovedZipArchive) might help. – fsacer Jul 07 '15 at 14:44
  • Sorry, I'am very busy and I change file name and directory to english. Thank you very much for help – Veerapat Boonvanich Aug 31 '15 at 14:55

1 Answers1

1

You probably should try mb_detect_encoding() for help with this - see the code below. You may need to expand on this code if you also have a problem with its path. Just use a loop if you need to do that.

$file = iconv('utf-8', 'windows-874', $_GET['File']);
$path = iconv('utf-8', 'windows-874', $_GET['Path']);

$zip = new ZipArchive;
if ($zip->open($file) === TRUE) {
    // convert to Thai language
    for($i = 0; $i < $zip->numFiles; $i++) {
        $name = $zip->getNameIndex($i);
        $order = mb_detect_order();
        $encoding = mb_detect_encoding($name, $order, true);
        if (FALSE === $encoding) {
             throw new UnexpectedValueException(
                sprintf(
                    'Unable to detect input encoding with mb_detect_encoding, order was: %s'
                , print_r($order, true)
                )
             );
        } else {
            $encoding = mb_detect_encoding($name);
            $stringUtf8 = iconv($encoding, 'UTF-8//IGNORE', $name);
            $zip->extractTo($path,$stringUtf8);
        }  
    }
    $zip->close();
    echo json_encode('unZip!!!');
} else {
    echo json_encode('Failed');
}
Mark Hurd
  • 10,665
  • 10
  • 68
  • 101
Tech Savant
  • 3,686
  • 1
  • 19
  • 39