51

I accidentally stopped hashing passwords before they were stored, so now my database has a mix of MD5 Passwords and unhashed passwords.

I want to loop through and hash the ones that are not MD5. Is it possible to check if a string is an MD5 hash?

NullPoiиteя
  • 56,591
  • 22
  • 125
  • 143
kmoney12
  • 4,413
  • 5
  • 37
  • 59
  • NullPointer's response is your best shot, but still, you can't be sure unless you're already allowing users to save a password that can be 32 characters long. – inhan Jan 13 '13 at 04:43
  • 7
    Off topic, but MD5 is considered "broken" for storing passwords due to the ease at which you can calculate all possible keys. Have a look at: http://stackoverflow.com/questions/4795385/how-do-you-use-bcrypt-for-hashing-passwords-in-php **and** http://stackoverflow.com/questions/1581610/how-can-i-store-my-users-passwords-safely **and** http://www.openwall.com/phpass/ – tsujp Jan 13 '13 at 04:50

2 Answers2

115

You can check using the following function:

function isValidMd5($md5 ='')
{
    return preg_match('/^[a-f0-9]{32}$/', $md5);
}

echo isValidMd5('5d41402abc4b2a76b9719d911017c592');

The MD5 (Message-digest algorithm) Hash is typically expressed in text format as a 32 digit hexadecimal number.

This function checks that:

  1. It contains only letters and digits (a-f, 0-9).
  2. It's 32 characters long.
eisbehr
  • 12,243
  • 7
  • 38
  • 63
NullPoiиteя
  • 56,591
  • 22
  • 125
  • 143
  • 1
    The `!empty` check is entirely superfluous there. – deceze Jan 13 '13 at 04:39
  • 6
    Why do you need to check if it's empty? Won't it already return false if `preg_match()` does not match? – inhan Jan 13 '13 at 04:39
  • What's the accuracy on this? Can I trust it completely? Sorry don't know too much about regular expressions. – kmoney12 Jan 13 '13 at 04:41
  • 1
    It will return 0 if it doesn't match, synonymous with `false`. +vote @inhan – tsujp Jan 13 '13 at 04:42
  • 2
    @hellohellosharp yeah, documentation reads `Returns the hash as a 32-character hexadecimal number.` which means the value should consist of `0-9` and `a-f` characters only and it should be 32 characters long. – inhan Jan 13 '13 at 04:44
  • @hellohellosharp I only started regular expressions yesterday but this `preg_match('/^[a-f0-9]{32}$/'` is stating thata string of 32 characters long can only contain letters a-f (hex) and numbers 0-9. There are a few reasons why you can trust this, and cannot. 1. MD5 is 32 characters long, 2. MD5 contains only lowercase alpha characters a through f. Anyone can fake an MD5 hash, there is no surefire method to detect if text, information, is from a certain source unless you go into things like secure login cookies etc..., which is a completely different subject. – tsujp Jan 13 '13 at 04:45
  • 1
    @hellohellosharp Long story short, this method is fine for analyzing the **format of the string** you are checking. – tsujp Jan 13 '13 at 04:46
  • @hellohellosharp something important is, make sure your `MD5()` call does not include the second parameter that can be converted into `true` otherwise this function will not help you. Check the `raw_output` (2nd) parameter in [`MD5()` documentation](http://php.net/manual/en/function.md5.php). – inhan Jan 13 '13 at 04:48
  • 4
    @hellohellosharp Nobody seems to have answered your 'accuracy' comment yet: this function verifies that a string is in the format of an MD5 hash, it does not validate that it *is* an actual hashed password - that cannot be determined. It will also return true if one of your *unhashed* passwords meets the format requirements - but that is unlikely. – Jan Doggen Feb 26 '14 at 14:46
  • Thanks! I used it like so: if (!preg_match('/^[a-f0-9]{32}$/', $md5)) {return false;} (not for a password, lol!) – kyle Apr 10 '14 at 22:14
  • 1
    Why are you using an empty string as default parameter? – Jack Feb 15 '15 at 14:07
  • 1
    @NullPoiиteя you have a typo: _"Its contain only letter and digit (a-z,0-9)"_ should be _"(a-**f**,0-9)"_ – Cliff Burton Sep 05 '16 at 15:10
  • Worth noting is that preg_match might not return boolean but int, therefore your function will not work in eg (if(isValidMd5($hash) === true)). Here is faster version, which also overcomes mentioned problem: http://sandbox.onlinephpfunctions.com/code/d3fd05890a818ca6bc720da1b535047d032784fe – xZero May 11 '18 at 00:23
46

Maybe a bit faster one:

function isValidMd5($md5 ='') {
  return strlen($md5) == 32 && ctype_xdigit($md5);
}
RaphaelH
  • 2,144
  • 2
  • 30
  • 43
  • 2
    +1, probably faster, and also more readable than the regex. It might do an early exit on the strlen call for most invalid strings, then I suspect the ctype_xdigit call is faster than running a regex engine. – laurent Apr 18 '14 at 01:57
  • Note that you can't be 100% sure that string which pass this test is md5 or plain text password. Unlikely, but someone may have a password similar to the md5 format. – l00k Apr 29 '14 at 08:06
  • 4
    @non: I do not agree with you. You can be 100% sure that the string is a valid md5 hash. Whether the string was intended to be a valid md5 hash is a whole another point.. – RaphaelH Apr 29 '14 at 11:23
  • @RaphaelH you didnt understand me.. "e4bfb280c702635cf71d46a0c8c33b96" it may be hashed (md5) password or just plain password. You can't be sure :) – l00k Jun 30 '14 at 11:43
  • 10
    @non: Once again, "e4bfb280c702635cf71d46a0c8c33b96" is for 100 percent a valid md5 hash, you can't be sure if it was intended to be one. If it's your password, then it's both valid md5 and your plain password. – RaphaelH Jun 30 '14 at 13:45
  • 1
    Yes, you can't be 100% sure, that it is md5 string in order to that, you need to use: `!ctype_digit($md5) && ctype_lower($md5) && ctype_alpha($md5)` – VeeeneX May 08 '15 at 13:02
  • 1
    @VeeeneX Answer might be a little late, but what you suggest is just complete bs and I want to clarify that. I am assuming the representation of the md5 hash as hex string as everyone else here does. It can and mostly does contain digits, so !ctype_digit is wrong. It does not have to be lowercase, so ctype_lower is wrong, nothing wrong with uppercase hashes. It can't have characters beyond f, so ctype_alpha is wrong. – RaphaelH Jun 20 '17 at 19:25
  • Faster only and only if character is shorter than 32. if not, then involves unnecessary function call overhead, in this case strlen – xZero May 11 '18 at 00:18
  • 1
    @xZero This is not about 2 function calls versus 1 function call, it's about running a regex engine versus 2 rather lightweight function calls. On my machine the non regex version runs about 75% faster, of course this depends on the input, in my test case random generated, see https://pastebin.com/BhpPpGyw to try yourself – RaphaelH May 12 '18 at 13:12
  • @RaphaelH I completely agree with you. I was not trying to say that your approach is slower, although it sounded like that. I just thought it's worth noting circumstances it is faster with. – xZero May 28 '18 at 12:53