2

all the idea i need to be sure that the file doesn't saved more than one time and don't lose any file because if tow files get the same (md5) the second file will not saved (my goal don't save the same file Twice on hard disk)

In other words, if one user upload image and after that another user upload the same image i need to don't save the the second image because it's already exist in the hard disk all of this because i need to save space on my hard disk this is my code it works fine

$targetFolder = '/test/uploadify/uploads'; // Relative to the root
$tempFile = $_FILES['Filedata']['tmp_name'];
$targetPath = $_SERVER['DOCUMENT_ROOT'] . $targetFolder;

$myhash = md5_file($_FILES['Filedata']['tmp_name']);
$temp = explode(".", $_FILES['Filedata']['name']);
$extension = end($temp);

$targetFile = rtrim($targetPath,'/') . '/' .$myhash.'.'.$extension;
if(file_exists($targetFile)){
    echo 'exist';
}

// Validate the file type
$fileTypes = array('jpg','jpeg','gif','png'); // File extensions
$fileParts = pathinfo($_FILES['Filedata']['name']);
if (in_array($fileParts['extension'],$fileTypes)) {
    move_uploaded_file($tempFile,$targetFile);
} 
else {
    echo 'Invalid file type.';
}

thanks for all of you

Ammar Qasem
  • 33
  • 1
  • 2
  • 6
  • YES, I do this!, but take a look at http://stackoverflow.com/questions/862346/how-do-i-assess-the-hash-collision-probability – coma Apr 13 '14 at 09:46
  • If you really want to be safe against file name clashes, you should use the tool that was made for that job - PHP's [tempnam](http://www.php.net/manual/en/function.tempnam.php) function. You can read in details what it does in detail if you google a little bit. – N.B. Apr 13 '14 at 09:49

4 Answers4

3

Well, of course you can do this, in fact this is the way I use to avoid file duplications (and I mean not having two files wit the same content and not just silly name collision).

If you are worried about collisions, then you might take a look at sha1_file:

http://es1.php.net/manual/en/function.sha1-file.php

What are the chances that two messages have the same MD5 digest and the same SHA1 digest?

I've been using the md5 approach the way you are suggesting here for image galleries and it works just fine.

Another thing to take care about is the time it takes to calculate the hash, the more complex the hash, the more time it needs, but I'm talking about processing really big batches.

Community
  • 1
  • 1
coma
  • 16,429
  • 4
  • 51
  • 76
  • thanks my friend but the chance is still exist ... are you try to get the size of the file after calculate the md5 will test if the file exist with the same md5 and the same size that means it is the same file and don't save it but if the same md5 and the size is different that means it is another file (here is the chance of collision ) ... i think it solve the problem – Ammar Qasem Apr 15 '14 at 12:34
  • The only way you can ensure that two files are not the same is comparing them byte by byte, so everyone use checksums to validate downloaded files, even GIT http://alblue.bandlem.com/2011/08/git-tip-of-week-objects.html. – coma Apr 15 '14 at 12:36
  • @AmmarQasem http://stackoverflow.com/questions/10738866/will-md5file-contents-as-string-equal-md5-file-path-to-file – coma Apr 15 '14 at 12:39
1

If I understand your question correctly your goal is just to generate unique file names. If so, there is no point in reinventing the wheel - every hash function with fixed output length is going to have collisions - just use built in tempnam function.

Manual states:

Creates a file with a unique filename, with access permission set to 0600, in the specified directory. If the directory does not exist, tempnam() may generate a file in the system's temporary directory, and return the full path to that file, including its name.

Following should work well enough:

$targetDirectory = $_SERVER['DOCUMENT_ROOT'] . '/test/uploadify/uploads';
$uploadedFile = $_FILES['Filedata']['tmp_name'];
$targetFile = tempnam($targetDirectory, '');
move_uploaded_file($uploadedFile, $targetFile);
Mikk
  • 2,209
  • 3
  • 32
  • 44
0

It's very small, but the chance is there. You can read more here and here

I suggest you add a salt to the end of the filename to make it practically impossible for files to conflict(You should put the salt in a different md5 function though)

$salt = md5(round(microtime(true) * 1000));
$hash = md5_file($_FILES['Filedata']['tmp_name']);
$targetFile = rtrim($targetPath,'/') . '/' .$hash.$salt.'.'.$extension;

You should then insert the filename in a database so you can access it later.

Jonan
  • 2,485
  • 3
  • 24
  • 42
  • thanks man ... i know i have chance to collision but i need to solve this to save space on my hard disk – Ammar Qasem Apr 13 '14 at 09:30
  • @AmmarQasem Did this help you or does my answer need to be improved? – Jonan Apr 13 '14 at 09:43
  • you answer is good if i don't need to save the space on the hard disk because with your answer the file will save twice in different time and that what i don't need – Ammar Qasem Apr 13 '14 at 10:00
0

You could always add the systems current time in milliseconds to the filename. That plus the md5, would have a very unlikely chance of returning the same values.

Jesson Atherton
  • 644
  • 7
  • 21