0

I'm using a form where users can upload their desired files to the system. Since all users are Persian-speaking, it's quite possible that some of file names will also be in Persian.

I use a very simple way to rename the files before they are uploaded. But I cannot keep the encoding when the file is uploaded with its new name. The name of newly-uploaded file (which has Persian letters in its name), consists of totally random unreadable characters.

Here's my code :

if($_FILES['replyattachment']['name']!=''){
        $msg_attachment=explode(".",$_FILES['replyattachment']['name']);
        $org_name=iconv('UTF-8', 'UTF-8', $msg_attachment[0]);
        $attachment_ext=$msg_attachment[1];

        $num=rand(1000,1999);
        $num+=rand(2000,3000);
        $attachment_name=time()+$num;

        $attached_file_name=$org_name.'_'.$attachment_name.'.'.$attachment_ext;

        move_uploaded_file($_FILES['replyattachment']['tmp_name'], ATTACHMENTS.$attached_file_name);
    }

I had never used nor even heard about iconv function before until I faced this issue. So I thought I should use "UTF-8" for both in_charset and out_charset since the original name might have UTF-8 characters, and I want the newly-generated name to be in UTF-8 encoding.

I read somewhere that the encoding of the file in which the codes are written matters as well. In that case, the file itself (which is messages.php) is in UTF-8 encoding.

What am I doing wrong about the iconv function?

Thanks in advance,

EDIT :

I searched a bit more and found something. As suggested by a user in another post, I tried using the Code Page for Ar/Fa characters, which was Windows-1256.

But it was also suggested to use rawurlencode and rawurldecode when we're going to save the name into database or to call it. So here's what happened to my code :

$org_name=iconv('UTF-8', 'Windows-1256', $msg_attachment[0]);
            $attached_file_name=rawurlencode($org_name.'_'.$attachment_name.'.'.$attachment_ext);

And later on when I want to call the link to the file, again I use the rawurldecode function.

But the thing is this entire process, changes the whole name of the file, even though it perfectly works. I insist on keeping the original name.

As far as I know, there's only one way of uploading a file with PHP, which is of course move_uploaded_file. How do PHP-based CMSs like Wordpress do the job with no trouble then?

FINAL EDIT :

After days of working around with various PHP functions and caving through numerous forums and websites, I finally found a solution!

WFIO is the solution. A PHP extension which you can download via this link and enable in your php.ini file. Visit these two pages as well in order to know more about its usage :

  • Does the server use Windows? You suggest it but you don't say explicitly and it's the most vital piece of info (PHP does not use the Unicode flavours of the Win32 APIs that are needed for this task, though I believe they fixed that somewhere on PHP/7.x). – Álvaro González Feb 01 '18 at 09:58
  • I'm testing all this on localhost, and my OS is Win10 and I'm using PHP 5.x. – Arash Barazandeh Feb 01 '18 at 10:13
  • With PHP/5 there's definitively nothing you can do. Let me check if I can find a similar question. – Álvaro González Feb 01 '18 at 10:16
  • See also https://stackoverflow.com/questions/24454911/utf-8-php-win7-is-there-a-solution-now-to-save-utf-8-filenames-on-win-7-usin – Álvaro González Feb 01 '18 at 10:19
  • Thanks for posting that link. But how does WordPress does that job perfectly on much lower versions? Like WP v.3.8.x has no trouble uploading files with Persian names. PHP v7 was not developed back then. – Arash Barazandeh Feb 01 '18 at 10:29
  • An added problem is that there're many actors involved: file system, file explorer, PHP, web server, browser... Some times stuff doesn't really work but appears to: e.g. you try to store `產` (`E7 94 A2` in UTF-8), you get instead three single-byte characters but when you read it back the last tool than handles the name just concatenate the bytes and you get the original name back. – Álvaro González Feb 01 '18 at 10:36

0 Answers0