0

I'm having a problem getting my PHP download script to work with special alphabetical characters like "æøå" which are not in the english alphabet. Files which include these letters can't be found and I am wondering if there is some kind of encoding problem. The files are stored on a windows machine running XAMPP.

$getFile = $_SESSION['base'].$_GET['file'];
$getFile = mb_convert_encoding($getFile, "UTF-8");

if (file_exists($getFile)) { //Retrives the file in path $getFile
    header('Content-Description: File Transfer');
    header('Content-Type: application/octet-stream');
    header('Content-Disposition: attachment; filename="'.basename($getFile).'"');
    header('Expires: 0');
    header('Cache-Control: must-revalidate');
    header('Pragma: public');
    header('Content-Length: ' . filesize($getFile));
    readfile($getFile);
    exit;
}

The string assigned to $getFile can look like this "files/projects/Abrahallen/administrasjon/Exempel på admin.txt"

So when file names with special characters are requested the file_exists does not find a file and if I comment out the if statement I get this error message

Array ( [file] => /Exempel på admin.txt ) files/projects/Abrahallen/administrasjon/Exempel på admin.txt
Warning: filesize(): stat failed for files/projects/Abrahallen/administrasjon/Exempel på admin.txt in C:\xampp\htdocs\files.php on line 16

Warning: readfile(files/projects/Abrahallen/administrasjon/Exempel på admin.txt): failed to open stream: No such file or directory in C:\xampp\htdocs\files.php on line 17

Johngear
  • 13
  • 6
  • And what exactly is your problem? Please be more specific, and add error messages if there are any. – Corubba Dec 30 '15 at 15:34
  • Well the script cannot find files with letter like "æøå", but will find other files that does not contain these characters – Johngear Dec 30 '15 at 15:42

2 Answers2

0

Try using realpath when you are generating the file path.

Like:

$getFile = $_SESSION['base'].$_GET['file'];
$getFile = realpath($getFile);
// This may or may not be needed... 
$getFile = mb_convert_encoding($getFile, "UTF-8");
segFault
  • 3,887
  • 1
  • 19
  • 31
  • That did not work. I know that the path to the file is correct since it can download all other files in that directory which does not contain special characters. – Johngear Dec 30 '15 at 16:11
  • The answer to [this question](http://stackoverflow.com/questions/1580475/file-name-with-special-characters-like-%C3%A9-not-found) may help you. If you are storing these files via scripts on the server, then you should consider encoding the file names as suggested in the answer linked. – segFault Dec 30 '15 at 16:27
0
$getFile = $_SESSION['base'].$_GET['file'];

Firstly this is dangerous. Filenames can include sequences like .. which will escape this directory, allowing access to any file on the server, not just those in the base directory. This filepath needs strong validation.

$getFile = mb_convert_encoding($getFile, "UTF-8");

This is probably not the right thing. You are converting a string to UTF-8 from the internal_encoding. This may be UTF-8 (in which case this does nothing), or it might be environment-defined (in which case it's unreliable and will break when you deploy to a different server). Either way, you end up with a different string to the one you put in and this won't match what's on the filesystem hence file not found.

So get rid of this line and you'll be treating the file parameter as a plain series of bytes. If you are generating the links to your script yourself (eg using scandir() to list files and create links to them by appending '?file='.urlencode($filename)) then this will be fine.

Well, mostly fine. If the script is deployed on a Linux or OS X server you'll be able to access all filenames this way. However on a Windows server, the filesystem is natively Unicode, and when you access it using a byte string (like PHP and other applications using the standard C stdio interfaces do), Windows converts those bytes to Unicode using the ‘ANSI’ code page, which is always some awful legacy locale-specific encoding and never UTF-8.

So on a Western (ANSI code page 1252) Windows installation you'll be able to access Exempel på admin.txt, but you won't be able to get to Příklady admin.txt due to the non-Western character in it. What's more, the meaning of the URLs can change when you move the service to a different server. For example if you went from a Windows server to a Linux server, or a Western Windows server to a Chinese one, then the implicit encoding of the file parameter would change and old links with non-ASCII characters in would break.

In general a better way of handling it is to treat the parameter as always being UTF-8, and accessing the filesystem using Windows's own Unicode-native functions instead of the C standard library. Unfortunately PHP doesn't have the ability to call these functions built-in, so this is tricky to do.

In general, accessing local filenames from a PHP script is really hard to do safely and if there is any way you can avoid it, you should. For example if you get to write the filenames yourself (rather than serving an existing directory of files) then you can apply your own ad-hoc encoding (eg hex-encoded-UTF-8) to avoid tricky characters. Or use file IDs stored in a database.

header('Content-Disposition: attachment; filename="'.basename($getFile).'"');

Getting this parameter right is also a load of pain. See this question for details.

Community
  • 1
  • 1
bobince
  • 528,062
  • 107
  • 651
  • 834
  • Thanks for a detailed answer. Can you recommend any other way I can give users with different levels of access access to directories that are restricted from everybody else if they do not the appropriate permissions to access it through a web interface? Because it seems php lack ways of doing this securely and conveniently. – Johngear Jan 02 '16 at 10:10
  • Yeah at that point you will want to be storing the list of users and what stores they have access to in a database, providing a login interface, and checking authorisation in the database before allowing the `readfile` to go ahead. – bobince Jan 02 '16 at 11:08