1

I'm trying to create a dynamic avatar for my website's users. Something like stackoverflow. I have a PHP script which generates an image based on a string:

path/to/avatar.php?hash=string

I want to use the MD5 of users' emails as the name of their avatars: (and as that string PHP script generates an image based on)

$email = $_GET['email'];
$hash  = md5($email);
copy("path/to/avatar.php?hash=$hash","path/img/$hash.jpg");

Now I want to be sure, can I use the MD5 of their emails as their avatar's name? I mean isn't there two different strings which have identical MD5's output? In other word I want to know whether will be the output of two different strings unique?

I don't know my question is clear or not .. All I want to know, is there any possibility of being duplicate the MD5 of two different emails?

Shafizadeh
  • 9,960
  • 12
  • 52
  • 89

2 Answers2

2

As the goal here is to use a hash for it's uniqueness rather than it's cryptographic strength MD5 is acceptable. Although I still wouldn't recommend it.

If you do settle on using MD5, use a globally unique id that you control rather than an user-supplied email address, along with a salt.

i.e.

$salt = 'random string';
$hash = md5($salt . $userId);

However:

  • There is still a small chance of a collision (starting at 2128 and approaching 264 relatively quickly due to the Birthday Paradox). Remember this is a chance, hashn and hashn+1 could collide.
  • There is not a reasonable way to determine the userId from the hash (I don't consider indexing 128-bit hashes so you can query them to be reasonable).

You use StackOverflow as an example.

User profiles on this site look like: http://stackoverflow.com/users/2805376/shafizadeh

So what is wrong with having avatar urls like http://your_site/users/2805376/avatar.png ? The back end storage could simply be /path/to/images/002/805/376.png

This guarantees a unique name, and provides you with a very simple and easy to work with way of storing, locating, and reversing the id assigned to images back to the user.

Leigh
  • 12,859
  • 3
  • 39
  • 60
  • Good explanations. Thx ... +1 – Shafizadeh Apr 25 '16 at 20:19
  • Also for this sentence: *So what is wrong with having avatar urls like `http://your_site/users/2805376/avatar.png`*. I have to say, SO uses user's id as the name of his avatar. I really don't know how it does that, because when user isn't registered in my website, so he doesn't have a id, and I need a name for his avatar before registering him. – Shafizadeh Apr 25 '16 at 20:38
1

This is actually what Gravatar is doing (this was the standard way to get an avatar in Stackoverflow). Have a look at Gravatars implementation.

The chance of a collision is negligible in practice, it is difficult enough to intentionally forge two (binary) strings which result in the same MD5 and EMails are restricted in size and characters.

One problem of this approach is what Fred-ii- mentioned, because brute-forcing of MD5 is so fast (100 Giga MD5 per second), somebody could try to find the original email address, whose MD5 is now visible. For short emails this would work in reasonable time.

Using a UUID could be a good alternative to derriving from an EMail address. You can create such an id without database access and be sure that you won't get a duplicate.

martinstoeckli
  • 23,430
  • 6
  • 56
  • 87