I'm writing a file manager and need to scan directories and deal with renaming files that may have multibyte characters. I'm working on it locally on Windows/Apache PHP 5.3.8, with the following file names in a directory:
- filename.jpg
- имяфайла.jpg
- file件name.jpg
- פילענאַמע.jpg
- 文件名.jpg
Testing on a live UNIX server woked fine. Testing locally on Windows using glob('./path/*')
returns only the first one, filename.jpg
.
Using scandir()
, the correct number of files is returned at least, but I get names like ?????????.jpg
(note: those are regular question marks, not the � character.
I'll end up needing to write a "search" feature to search recursively through the entire tree for filenames matching a pattern or with a certain file extension, and I assumed glob()
would be the right tool for that, rather than scan all the files and do the pattern matching and array building in the application code. I'm open to alternate suggestions if need be.
Assuming this was a common problem, I immediately searched Google and Stack Overflow and found nothing even related. Is this a Windows issue? PHP shortcoming? What's the solution: is there anything I can do?
Addendum: Not sure how related this is, but file_exists()
is also returning FALSE
for these files, passing in the full absolute path (using Notepad++, the php file itself is UTF-8 encoding no BOM). I'm certain the path is correct, as neighboring files without multibyte characters return TRUE
.
EDIT: glob()
can find a file named filename-äöü.jpg
. Previously in my .htaccess
file, I had AddDefaultCharset utf-8
, which I didn't consider before. filename-äöü.jpg
was printing as filename-���.jpg
. The only effect removing that htaccess line seemed to have was now that file name prints normally.
I've deleted the .htaccess
file completely, and this is my actual test script in it's entirety (I changed a couple of file names from the original post):
print_r(scandir('./uploads/'));
print_r(glob('./uploads/*'));
Output locally on Windows:
Array
(
[0] => .
[1] => ..
[2] => ??? ?????.jpg
[3] => ???.jpg
[4] => ?????????.jpg
[5] => filename-äöü.jpg
[6] => filename.jpg
[7] => test?test.jpg
)
Array
(
[0] => ./uploads/filename-äöü.jpg
[1] => ./uploads/filename.jpg
)
Output on remote UNIX server:
Array
(
[0] => .
[1] => ..
[2] => filename-äöü.jpg
[3] => filename.jpg
[4] => test이test.jpg
[5] => имя файла.jpg
[6] => פילענאַמע.jpg
[7] => 文件名.jpg
)
Array
(
[0] => ./uploads/filename-äöü.jpg
[1] => ./uploads/filename.jpg
[2] => ./uploads/test이test.jpg
[3] => ./uploads/имя файла.jpg
[4] => ./uploads/פילענאַמע.jpg
[5] => ./uploads/文件名.jpg
)
Since this is a different server, regardless of platform - configuration could be different so I'm not sure what to think, and I can't fully pin it on Windows yet (could be my PHP installation, ini settings, or Apache config). Any ideas?