3

I am using fopen() to create files with file names based on the user's input. In most of the cases that input will be cyrillic. I want to be able to see the file names on my computer, but seemingly they aren't with the right encoding and my OS (Windows 10) displays something like this - "ЙосиС.txt".

Windows uses UTF-16, so I tried to convert the encoding of the variable where the name is stored to UTF-16, but I got errors when using fopen, fwrite and fclose.

This is the code:

<?php
if(isset($_POST["submit"])) {
$name = $_POST["name"];
$file = fopen("$name.txt", "a");
fwrite($file, $string);
fclose($file);
}?>
Azanyr
  • 197
  • 1
  • 9
  • use `iconv()` to convert encoding. – frz3993 Mar 22 '16 at 14:49
  • `fopen()` functions work binary safe, they don't interpret bytes (encode). When you say you got errors using fopen etc, what were those? – Daniel W. Mar 22 '16 at 14:51
  • Warning: fopen() expects parameter 1 to be a valid path, string given in ... – Azanyr Mar 22 '16 at 14:52
  • That error is not related with encoding. Is that `fopen()` does not found the file, due a wrong path specified. – Marcos Pérez Gude Mar 22 '16 at 14:55
  • @frz3993, do I have to install iconv, is that an extension? – Azanyr Mar 22 '16 at 14:55
  • @MarcosPérezGude But, I am creating the file, I am not trying to open an existing one and if I remove the line where I change the encoding, fopen works. – Azanyr Mar 22 '16 at 14:56
  • It's interesting, NTFS stores filenames as UTF-16 yet `fopen` doesn't like it. Unfortunately I have no windows box right here to further test it. – Daniel W. Mar 22 '16 at 14:59
  • Since when windows uses `utf-16` instead of `iso-8859-1`???? – Marcos Pérez Gude Mar 22 '16 at 15:01
  • @Marcos [since a year or fifteen](https://en.wikipedia.org/wiki/Unicode_in_Microsoft_Windows)? – CodeCaster Mar 22 '16 at 15:03
  • This is only for filesystem, I guess. When I edit a file in Windows, I still need to check if UTF-XX is enabled in the file instead of iso-8859-1. – Marcos Pérez Gude Mar 22 '16 at 15:06
  • @Marcos yes, filename encodings have very little to to with the encoding of the actual file's contents. – CodeCaster Mar 22 '16 at 15:07
  • @Marcos that is a senseless Windows-bash. The encoding used for a file system has nothing to do with the encoding of the actual files, on any OS. It is perfectly possible to store Unicode files (UTF-8, UTF-16 and so on) on a file system that only supports ANSI filenames. To a filesystem, every file consists of mere bytes, it doesn't care what those bytes represent. – CodeCaster Mar 22 '16 at 15:09
  • I understand your position, but I didn't have any problem with filenames in UNIX systems before, even with cyrilic chars (I need to work with russian and greek projects in the past). I don't use `fopen()`for this task, I use `file_get_contents()`, but the filenames usually had special characters and never have a problem. But when I turn into windows to develop, I experiment a lot of problems produced by windows incongruences. – Marcos Pérez Gude Mar 22 '16 at 15:13
  • @Marcos alright, but PHP is to blame for that. Windows has had perfect Unicode support for quite some years now. – CodeCaster Mar 22 '16 at 15:14
  • 1
    Yes, I agree, maybe PHP guys can do this better. I wish PHP7 will be a revolution in coding. Again, thank you for the clear information you are sharing here. – Marcos Pérez Gude Mar 22 '16 at 15:15

1 Answers1

3

It's true that Windows and NTFS use UTF-16 for filenames, so you can read and write files with Unicode characters in their name.

However, you need to call the appropriate function in order to leverage Unicode: _wfopen() (C runtime) or CreateFileW() (Windows API). See What encoding are filenames in NTFS stored as?.

PHP's fopen() does not call either of those functions, it uses the plain old ANSI fopen(), as apparently PHP is not compiled with the _UNICODE constant which will cause fopen() to be converted to _wfopen() and so on (see also How to open file in PHP that has unicode characters in its name? and glob() can't find file names with multibyte characters on Windows?).

See below for a couple of possible solutions.

Database

A database solution: write the Unicode name in a table, and use the primary key of the table as your filename.

Transliteration

You could also use transliteration (as explained in PHP: How to create unicode filenames), which will substitute the Unicode characters that aren't available in the target character set with similar characters. See php.net/iconv:

$filename = iconv('UTF-8', 'ASCII//TRANSLIT', "Žluťoučký kůň\n");
// "Zlutoucky kun"

Note that this can cause collisions, as multiple different Unicode characters could be transliterated to the same ANSI character sequences.

Percent-encoding

Another suggestion, as found in How do I use filesystem functions in PHP, using UTF-8 strings?, is to urlencode the filename (note that you shouldn't directly pass user input to the filesystem like this, you're allowing users to overwrite system files):

$name = urlencode($_POST["name"]) . ".txt";
$file = fopen($name, "a");

Recompile PHP with Windows Unicode support

If your end goal is to write files with Unicode file names without changing any code, you'll have to compile PHP yourself on Windows using the _UNICODE constant and a Microsoft compiler, and hope it'll work. I suppose not.

Most viable: WFIO

Alternatively, you can use the suggestion from How to open file in PHP that has unicode characters in its name? and use the WFIO extension, and refer to files via the wfio:// protocol.

file_get_contents("wfio://你好.xml");
Community
  • 1
  • 1
CodeCaster
  • 147,647
  • 23
  • 218
  • 272