9

For example I have a filename like this - проба.xml and I am unable to open it from PHP script.

If I setup php script to be in utf-8 than all the text in script is utf-8 thus when I pass this to file_get_contents:

$fname = "проба.xml";
file_get_contents($fname);

I get error that file does not exist. The reason for this is that in Windows (XP) all file names with non-latin characters are unicode (UTF-16). OK so I tried this:

$fname = "проба.xml";
$res = mb_convert_encoding($fname,'UTF-8','UTF-16');
file_get_contents($res);

But the error persists since file_get_contents can not accept unicode strings...

Any suggestions?

Deduplicator
  • 44,692
  • 7
  • 66
  • 118
Darko Miletic
  • 1,168
  • 1
  • 13
  • 21
  • Is this code current? You didn't switch $fname with $res in file_get_contents, or was that just a typo? – ryanday Jun 10 '09 at 19:37
  • This is my typo. I did actually switch the values. – Darko Miletic Jun 10 '09 at 22:02
  • I got to my XP system and tried your code. I saved the PHP file in unicode, and copy/pasted what you wrote and I can read the file(same filename). What encoding is your source file saved in? – ryanday Jun 11 '09 at 00:56
  • It's not the file content that is problem. It is the file name. If file name contains non-ascii characters, on windows, it is saved as unicode filename, not as unicode file content. – Darko Miletic Jun 11 '09 at 13:16
  • My source file is saved in utf-8, I also tried iso-8859-1 and it's the same. Error persists. – Darko Miletic Jun 11 '09 at 13:16
  • OK, I saved php file as unicode and php refuses to execute it. What web server are you using? – Darko Miletic Jun 11 '09 at 13:22

3 Answers3

11

UPDATE (July 13 '17)

Although the docs do not seem to mention it, PHP 7.0 and above finally supports Unicode filenames on Windows out of the box. PHP's Filesystem APIs accept and return filenames according to default_charset, which is UTF-8 by default.

Refer to bug fix here: https://github.com/php/php-src/commit/3d3f11ede4cc7c83d64cc5edaae7c29ce9c6986f


UPDATE (Jan 29 '15)

If you have access to the PHP extensions directory, you can try installing php-wfio.dll at https://github.com/kenjiuno/php-wfio, and refer to files via the wfio:// protocol.

file_get_contents("wfio://你好.xml");

Original Answer

PHP on Windows uses the Legacy "ANSI APIs" exclusively for local file access, which means PHP uses the System Locale instead of Unicode.

To access files whose filenames contain Unicode, you must convert the filename to the specified encoding for the current System Locale. If the filename contains characters that are not representable in the specified encoding, you're out of luck (Update: See section above for a solution). scandir will return gibberish for these files and passing the string back in fopen and equivalents will fail.

To find the right encoding to use, you can get the system locale by calling <?=setlocale(LC_TYPE,0)?>, and looking up the Code Page Identifier (the number after the .) at the MSDN Article https://msdn.microsoft.com/en-us/library/dd317756(VS.85).aspx.

For example, if the function returns Chinese (Traditional)_HKG.950, this means that the 950 codepage is in use and the filename should be converted to the big-5 encoding. In that case, your code will have to be as follows, if your file is saved in UTF-8 (preferrably without BOM):

$fname = iconv('UTF-8','big-5',"你好.xml");
file_get_contents($fname);

or as follows if you directly save the file as Big-5:

$fname = "你好.xml";
file_get_contents($fname);
Henry
  • 1,339
  • 13
  • 24
  • Hi, Can you tell me how to extract .zip file, that contain files' name is UTF-8 – Minh Le Jul 07 '15 at 08:48
  • @Henry Chan Sorry to tell you that it was begin support since PHP7.1 NOT 7.0. See https://www.php.net/manual/en/migration71.windows-support.php for more details. This code ```var_dump(file_get_contents('ภาษาไทย.html')); $h = fopen('ภาษาไทย php ' . PHP_VERSION . '-fopen.txt', 'w'); fclose($h); var_dump(file_put_contents('ภาษาไทย php ' . PHP_VERSION . '.txt', '', FILE_APPEND));``` runs perfectly on PHP 7.1 or newer but not for 7.0 or older. – vee Aug 07 '20 at 13:41
0

You could try:

  • getting the string for the filename from a directory listing using opendir and readdir
  • passing that string to file_get _contents to see if that will work, or
  • try getting the content of the file using fopen, fread and fclose

Hope this helps!

ylebre
  • 3,100
  • 1
  • 18
  • 14
0

These are conclusions so far:

  1. PHP 5 can not open filename with unicode characters unless the source filename is unicode.
  2. PHP 5 (at least on windows XP) is not able to process PHP source in unicode.

Thus the conclusion this not doable in PHP 5.

Darko Miletic
  • 1,168
  • 1
  • 13
  • 21
  • 1
    PHP can open a filename with non-ASCII characters only if all the characters are in the Windows installation's default code page. It can deal with string literals containing non-ASCII characters; it just uses the direct bytes, so how that works will depend on what encoding you saved the source file in, in your text editor. The encoding that many Windows text editors inaccurately call “Unicode” is in fact UTF-16LE, which, being non-ASCII-compatible, PHP can't deal with. See [this question](http://stackoverflow.com/q/482342/18936) for background. – bobince Nov 04 '10 at 00:51