0

On my project I have image linked from other domain

<img src="http://www.somewhere.com/images/whatever-image.jpg">

I would like to resave it locally with some name based on its original src-attribute the way I can check later if I have already saved this image locally or not. My question is, when I use md5() on src-attribute is it then unique? Example:

$src = "http://www.somewhere.com/images/whatever-image.jpg"
if (file_exists('local_path'.md5($src)) {
...
}

Can I rely on md5() in this case? If not, how this can be done properly?

Linda
  • 419
  • 1
  • 4
  • 12
  • See http://stackoverflow.com/questions/4032209/is-md5-still-good-enough-to-uniquely-identify-files – Rob Apr 23 '13 at 15:55
  • Rob, yes, I knew about issues with md5(). Still looking for possibility to make it for my case really unique. – Linda Apr 23 '13 at 16:02
  • ok - try http://php.net/manual/en/function.uniqid.php – Rob Apr 23 '13 at 16:04
  • If I go with timestamp I cannot recheck it later based on the src-attribute, right? – Linda Apr 23 '13 at 16:07
  • Reading your other comments - you want to accept a url string create a unique id from that string and then beable to decypher later? - what happens if you get the same string again? – Rob Apr 23 '13 at 16:07
  • If there is collision the file will be just replaced. If the possibility of the collision is tiny, it is no problem for me. Saying, one to 1 000 000 ;) But I do not know how tiny the possibility is. – Linda Apr 23 '13 at 16:09
  • 1
    I think you will be ok - http://stackoverflow.com/questions/201705/how-many-random-elements-before-md5-produces-collisions – Rob Apr 23 '13 at 16:11
  • Wonderful! Thank you! I wonder why I have not found the question while searching. – Linda Apr 23 '13 at 16:19

4 Answers4

2

MD5 is quite safe to use in this case. The fast hashing time, which makes it unsafe for cryptographic purposes, is actually a bonus here.

for the probability of collisions read for example this: How many random elements before MD5 produces collisions?

on the other hand, for your purpose it would be enough to simply strip those characters from the source attribute value that are not allowed in a file path, i. e.

$localFileName = str_replace(array('/', ':'), '', $src); //may need to strip '&', too..

this way the filenames are more human readable and easier to process further if the need arises

Community
  • 1
  • 1
cypherabe
  • 2,562
  • 1
  • 20
  • 35
0

I think you can. Collision still may happen but it is very rare possibility.

Sudo Reboot
  • 220
  • 2
  • 11
0

Try SHA1(), it will collide less than MD5(). However, unless you need to obfuscate the URL, you don't need to use these hashes at all. Simply save the string into a text field in your database, maybe running url_encode() if needed. This way you can scale indefinitely without worrying about a collision.

Bryan Potts
  • 901
  • 1
  • 7
  • 20
0

Add current timestamp to the path. This will make sure that the path is unique.

dmnptr
  • 4,258
  • 1
  • 20
  • 19
  • And how can I check if the file exists later if I am not aware of the timestamp it has been saved? – Linda Apr 23 '13 at 16:04