5

I'm working on a project that does a substantial amount of pathname manipulation. It runs fine on Mac OS X but to my astonishment, there's no way to call functions like realpath with utf-8 encoded strings on Windows. This is because setlocale on Windows doesn't support utf-8:

PHP: setlocale in Windows 7

http://msdn.microsoft.com/en-us/library/x99tb11d.aspx

http://www.phpwact.org/php/i18n/utf-8

I can make wrapper functions that call utf8_encode and utf8_decode internally, but there are so many file functions in php that this places quite a burden on me.

Also the project is open source so users may perceive it as breaking when they try to use my strings (which are all utf-8 as to be web-oriented) in their own code to read/write files.

Is there a library that provides utf-8 versions of the major php file functions? I'd prefer a self-contained directory over an extension so that I can bundle it with my code.

I'm concerned that if I can't find a library like this, it may be the achilles heal that makes php unsuitable for cross platform development for me. I'm also curious how other languages like ruby, python and c++ get around this issue, if they do at all. If not, this may just be another strike against Windows and I will have to figure out another workaround.

Community
  • 1
  • 1
Zack Morris
  • 4,727
  • 2
  • 55
  • 83
  • Do you have to support non-ansi characters in your path names? If not, no need to bother at all. – ToBe Nov 06 '13 at 14:27
  • Yes because I want to support international characters like ü and ñ. – Zack Morris Nov 06 '13 at 20:24
  • Same problem. Never found a solution. Also I'm not sure running utf8_encode and re-encoding can work. If I remember well at the time of testing I lost some accents in filenames. See [comments here](http://www.php.net/manual/en/function.utf8-encode.php) – yuri Nov 06 '13 at 20:58
  • You want to support international characters in file names? Im pretty sure you will run into loads of other OS specific and unavoidable problems, even if you solve your PHP related troubles. You should really reconsider this idea. If you have to, it would be a good design decisions to wrap all file access function in your own utility class that manages calls to your file functions. This would also enable you to get cross platform or OS Version independant in no time, let alone providing you with one single place for your name mangling. – ToBe Nov 07 '13 at 09:30
  • It's sad but PHP [does not use the Win32 API functions that support multi-byte paths](http://stackoverflow.com/a/2950046/13508). However, `utf8_encode()` will not fix anything: Windows file systems do not not use ISO-8859-1. – Álvaro González Nov 07 '13 at 09:46
  • A few more links I found on this issue: http://stackoverflow.com/questions/6634832/file-exists-and-file-get-contents-fail-on-a-file-which-is-named-output?rq=1 http://stackoverflow.com/questions/2050973/what-encoding-are-filenames-in-ntfs-stored-as – Zack Morris Nov 08 '13 at 18:59

1 Answers1

1

Windows setlocale() function (not php but the actual library function) does not support multi byte encodings. So you're right it is not possbile to set an utf-8 locale on windows.

You don't need to define new function however. Write a stream wrapper. Stream wrappers can be used by the most file related functions, even internal ones like loading a document in an XSLT template.

ThW
  • 19,120
  • 3
  • 22
  • 44
  • Ok thanks this is the type of answer that I was looking for. My plan is to retrieve the current locale with $locale = setlocale(LC_CTYPE, "0") and write a stream wrapper that uses iconv() internally with the current codepage, similar to this answer: http://stackoverflow.com/a/6810167/539149 my only question is if functions like file_get_contents($path) will call through to the wrapper even if "file://" is not prepended to the path. I'm hopeful it will work but will report back here if it doesn't. – Zack Morris Nov 08 '13 at 18:54