10

I have a PHP application.

I allow users to upload files to my web application.

Question: What's the best way for me to sanitize the file names of the uploaded documents $_FILES["filename"]["tmp_name"] in PHP?

UPDATE:

Can I take an MD5 of the uploaded filename and use that as the newly assigned filename? If so, how do I do that in PHP?

frooyo
  • 1,863
  • 3
  • 19
  • 21
  • Can you give a clear definition of 'sanitize'? As in for MySQL? A URL? – fredley Sep 20 '10 at 22:37
  • I'm uploading the files to my web server. The files can be images, documents, etc. I don't want filename collisions. And I don't want people to try to upload filenames that might be disallow on my filesystem. – frooyo Sep 20 '10 at 22:39
  • You should also be aware of what kind of files you allow them to upload. You don't want someone to be able to upload things like html/javascript files. – Moses Sep 20 '10 at 23:05
  • @Moses: He might want to let people upload HTML or EXE files. However he should handle them more carefully, eg. don't serve HTML as text/html but as plain/text. – Crozin Sep 20 '10 at 23:27
  • 1
    Going off of what Moses said, you should definitely watch out for the content in the files. For example, if someone uploads a PHP file, it could contain malicious code that allowed further exploits, distributed a virus, or more. A way to prevent this would be to check the extensions: php, asp, etc should not be allowed. – Kranu Sep 20 '10 at 23:28

6 Answers6

5

I bet that you also store some information about the file in the database. If this is correct, then you can use the primary key (ID) as a filename on your server and preserve the original filename in the database. This gives you greater flexibility, because you can manipulate the metadata without renaming the actual file.

Adam Byrtek
  • 12,011
  • 2
  • 32
  • 32
2

I would just run a simple regex that replaces any non alphanumeric characters with an underscore (or just remove these character altogether). Make sure you preserve the extension of course.

If you want to go a bit further, you could use magic mime extension to ensure the file is the same format that the extension says it is.

EDIT: To avoid filename collisions in a directory, you could append a md5 of users IP + current time to the filename.

Sam Day
  • 1,713
  • 9
  • 17
2

To avoid filename collision just check whether given or generated filename doesn't already exists:

do {
   // Generate filename, eg.:
   $filename = md5(uniqid()) . $fileExtension;
} while (file_exists($filename));

That gives you 100% sure that the filename is unique. Using md5 (or any other hash algorithm) ensures you that the filename is secure - and easy to handle.

Crozin
  • 43,890
  • 13
  • 88
  • 135
  • 3
    This makes no sense. You're taking the MD5 of the uniqid function. Why? – frooyo Sep 20 '10 at 23:45
  • That's only example. You can use original filename etc. The point is that you should check in loop whether generated filename is already in use. If so regenerate filename. – Crozin Sep 20 '10 at 23:54
  • 4
    Taking md5() of uniqid() makes no sense. Taking md5() of the original filename (i.e. a constant value) in a loop makes even less sense. :-) – Andras Nemeth Apr 27 '13 at 21:18
  • 1
    Of course, taking md5 out of uniqid does makes sense! uniqid is very long string containing dashes, md5 converts it to the sequence of alphanumeric characters, which is a lot more pleasing to see. Even taking md5 out of `time() . $filename` makes a lot of sense - you're generating the alphanumeric filenames guaranteed to be unique! – hijarian Jan 10 '14 at 07:10
  • @Crozin Sorry for necroposting, but why not md5(time()), but md5(uniqid())? I understand it's maybe a matter of taste but anyway? – hijarian Jan 10 '14 at 07:13
  • `time()` has 1 second resolution, while `uniqid()` (which uses `microtime()`) has 1 milisecond resolutioin. This means that if you'd use `time()` in case of collision your loop would do the same task for 1 second - it would be pointless. – Crozin Jan 10 '14 at 07:16
  • Just reading through this thread i'v become smarter. I happen to not pay attention to such finer details as noticing the time performance difference between uniqid() and time(). Next time instead of just going with the highest rated answer I'll read though all answers and following comments. – Victor Ighalo Apr 19 '19 at 06:37
  • @wyz1 You may have misunderstood Crozin's comment (or I have misunderstood yours :p). The difference between time() and uniqid() is not a matter of performance but a matter of precision. With time(), you have duplicates if two files are uploaded in the same second. With uniqid(), you have duplicates only if two files are uploaded in the same 1/1000000th of a second. Besides, uniqid() has an optional parameter to add more entropy to the result and make it even safer. – Thibault Witzig Apr 23 '19 at 15:24
1

Ciao, this function also removes all the points and then I create the clean string with the extension.

function sanitaze_upload_file($data)
{
    $imgName   = $data;
    $indexOFF  = strrpos($imgName, '.');
    $nameFile  = substr($imgName, 0,$indexOFF);
    $extension = substr($imgName, $indexOFF);
    $clean     = preg_replace("([^\w\s\d\-_~,;\[\]\(\)])", "", 
    $nameFile);
    $NAMEFILE  = str_replace(' ', '', $clean).$extension;
    return $NAMEFILE;
}
the_martux
  • 885
  • 2
  • 7
  • 18
0

Instead of sanitizing filenames specified by the user, use any other unique identifier for that photo and store that as the filename. I prefer using user ID's which are numeric and always unique.

move_uploaded_file($_FILES["tmp_name"],"/home/yourname/".$user_id));

You can then fetch the image from any location (say, S3 or even your own server) by just knowing the user's ID. You don't even need a attribute in your database to store image URL's.

Gaurav Gupta
  • 5,380
  • 2
  • 29
  • 36
-2

If you're not against losing the actual filenames, what I usually do is create a hash of the filename and set the filename to that, if whatever you're developing has loads of pictures being uploaded it helps avoid conflicts where two filenames are named alike and overwrites occur.

hash('md5', $_FILES["filename"]["tmp_name"]);
jduren
  • 630
  • 5
  • 11
  • Do you mean: **hash('md5', $_FILES["filename"]["tmp_name"] )** ? – frooyo Sep 20 '10 at 22:44
  • Yes, the first argument you pass is the type of hash you would like to create and the second argument is the string you want to create the hash from. PHP Hash - http://www.php.net/manual/en/function.hash.php – jduren Sep 20 '10 at 22:46
  • I also would suggest adding Sam Days practice as well where you could add the current time to the filename before hashing, that would create an even more unique filename. – jduren Sep 20 '10 at 22:49
  • 2
    If you're going the 'just have a (pseudo)random filename'-route, using the `tempnam()`-function will automatically solve race-conditions. – Wrikken Sep 20 '10 at 22:58
  • 1
    You'd definitely need to add something before hashing. Hashing the filename alone won't prevent name collisions, because the same filename will always produce the same hash. – Ben Dunlap Sep 20 '10 at 23:01
  • @jduren, please update your anser to be: **hash('md5', $_FILES["filename"]["tmp_name"] )**. If you update the answer, I'll mark is a "accepted" – frooyo Sep 20 '10 at 23:06
  • @Wrikken tempnam() actually creates a new file altogether doesn't it? Using a hashed unique string then just renaming the file would be more ideal. – jduren Sep 20 '10 at 23:15
  • No, that's the tmp filename PHP uses to when a user upload a file to the server. – frooyo Sep 20 '10 at 23:16
  • @user401839 No, Wrikken mentioned a tempnam() function in his comment (4th comment down on my answer) and I am wondering in terms of performance, is using tempnam() which creates a new file altogether better versus just renaming the file with a hash of the original filename with the time or some other unique data appended. – jduren Sep 20 '10 at 23:21
  • @jduren: performance difference is negligible, avoiding race conditions & filename-collisions isn't, and before you've got a solid implementation of those `tempnam()` should be already done. – Wrikken Sep 20 '10 at 23:24
  • @Wikkan any chance you can provide a useage example where you would use tempnam() to deal with this question for instance. I read a little off the PHP Man page for the tempnam() function and just don't see how tempnam is useful in this instance. As far as I can tell it creates a a temporary file with the option to set a prefix. But it creates it with the .tmp extension then returns the filename? Creating an entire new file just to get a unique string seems a bit overkill but I may be interpreting it incorrectly. Example? – jduren Sep 21 '10 at 00:42
  • (1) You will actually use the file as the new file, the target for the upload (that the file isn't temporary as `tempnam` will have you think doesn't matter, and don't let that throw you of) (2) No single upload every overwrites an existing one, even if some_hash(filename) is equal (3) No 2 simultaneous upload will ever claim the same filename, even though they are simultaneous and result in the same hash. And yes, the chance for (2) or (3) to occur are slim, however, given a combination of enough visitors & time, it will happen someday, and then you're glad you coded to make it no problem. – Wrikken Sep 21 '10 at 09:19