12

I have a script to convert to base 62 (A-Za-z0-9) but how do I get a number out of MD5?

I have read in many places that because the number from an MD5 is bigger than php can handle as an integer it will be inaccurate... As I want a short URL anyway and was not planning on using the whole hash, maybe just 8 characters of it....

So my question is how to get part of the number of an MD5 hash?

Also is it a bad idea to use only part of the MD5 hash?

Justin Johnson
  • 30,978
  • 7
  • 65
  • 89
Mark
  • 5,423
  • 11
  • 47
  • 62

9 Answers9

9

I'm going to suggest a different thing here.. Since you are only interested in using a decimal chunk of the md5 hash why don't you use any other short numeric hash like CRC32 or Adler? Here is an example:

$hash = sprintf('%u', crc32('your string here'));

This will produce a 8 digit hash of your string.

EDIT: I think I misunderstood you, here are some functions that provide conversions to and from bases up to 62.

EDIT (Again): To work with arbitrary length numbers you must use either the bc_math or the GMP extension, here is a function that uses the bc_math extension and can also convert from base 2 up to base 62. You should use it like this:

echo bc_base_convert(md5('your url here'), 16, 62); // public base 62 hash

and the inverse:

echo bc_base_convert('base 62 encoded value here', 62, 16); // private md5 hash

Hope it helps. =)

Alix Axel
  • 151,645
  • 95
  • 393
  • 500
  • is it possible to work out what went into the hash? Just I am thinking if I only ever show part of a hash it must make it more difficult to workout how it was generated... right? – Mark Dec 10 '09 at 11:32
  • Right, but then it wouldn't be a hash in the true sense of the word, also collisions are much more probable to occur. – Alix Axel Dec 10 '09 at 11:50
4

If it's possible, I'd advise not using a hash for your URLs. Eventually you'll run into collisions... especially if you're truncating the hash. If you go ahead and implement an id-based system where each item has a unique ID, there will be far fewer headaches. The first item will be 1, the second'll be 2, etc---if you're using MySQL, just throw in an autoincrement column.

To make a short id:

//the basic example
$sid = base_convert($id, 10, 36);

//if you're going to be needing 64 bit numbers converted 
//on a 32 bit machine, use this instead
$sid = gmp_strval(gmp_init($id, 10), 36);

To make a short id back into the base-10 id:

//the basic example
$id = base_convert($id, 36, 10);

//if you're going to be needing 64 bit numbers
//on a 32 bit machine, use this instead
$id = gmp_strval(gmp_init($shortid, 36));

Hope this helps!

If you're truly wanting base 62 (which can't be done with gmp or base_convert), check this out: http://snipplr.com/view/22246/base62-encode--decode/

brianreavis
  • 11,562
  • 3
  • 43
  • 50
  • Sorry maybe I didn't make myself clear enough, the md5 is not functioning as an id... just as a way to stop a user guessing the next url... so the url is id=1&md5=dsf213sfe. Thank you anyway – Mark Dec 10 '09 at 11:29
1

You can do this like this: (Not all steps are in php, it's been a long time that I've used it.)

There's no risk in using only a few of the bits of a md5. All that changes is danger of collisions.

Community
  • 1
  • 1
Georg Schölly
  • 124,188
  • 49
  • 220
  • 267
1

There actually is a Java implementation which you could probably extract. It's an open-source CMS solution called Pulse.

Look here for the code of toBase62() and fromBase62().

http://pulse.torweg.org/javadoc/src-html/org/torweg/pulse/util/StringUtils.java.html

The only dependency in StringUtils is the LifeCycle-class which provides a way to get a salted hash for a string which you might even omit all together or just copy the method over to your copy StringUtils. Voilá.

akjoshi
  • 15,374
  • 13
  • 103
  • 121
0

You can do something like this,

$hash = md5("The data to be hashed", true);
$ints = unpack("L*num", $hash);

$hash_str = base62($ints['num1']) . base62($ints['num2']) . base62($ints['num3']) . base62($ints['num4'])
ZZ Coder
  • 74,484
  • 29
  • 137
  • 169
0

You may try base62x to get a safe and compatible encoded representation.

Here is for more information about base62x, or simply -base62x in -NatureDNS.

shell> ./base62x -n 16 -enc 16AF 
1Ql
shell> ./base62x -n 16 -dec 1Ql 
16AF

shell> ./base62x 
Usage: ./base62x [-v] [-n <2|8|10|16|32>] <-enc|dec> string 
Version: 0.60 
Will
  • 24,082
  • 14
  • 97
  • 108
0

Here is an open-source Java library that converts MD5 strings to Base62 strings https://github.com/inder123/base62

Md5ToBase62.toBase62("9e107d9d372bb6826bd81d3542a419d6") ==> cbIKGiMVkLFTeenAa5kgO4

Md5ToBase62.fromBase62("4KfZYA1udiGCjCEFC0l") ==> 0000bdd3bb56865852a632deadbc62fc

The conversion is two-way, so you will get the original md5 back if you convert it back to md5:

Md5ToBase62.fromBase62(Md5ToBase62.toBase62("9e107d9d372bb6826bd81d3542a419d6")) ==> 9e107d9d372bb6826bd81d3542a419d6

Md5ToBase62.toBase62(Md5ToBase62.fromBase62("cbIKGiMVkLFTeenAa5kgO4")) . ==> cbIKGiMVkLFTeenAa5kgO4

```

inder
  • 1,774
  • 1
  • 15
  • 15
0

As of PHP 5.3.2, GMP supports bases up to 62 (was previously only 36), so brianreavis's suggestion was very close. I think the simplest answer to your question is thus:

function base62hash($source, $chars = 22) {
  return substr(gmp_strval(gmp_init(md5($source), 16), 62), 0, $chars);
}

Converting from base-16 to base-62 obviously has space benefits. A normal 128-bit MD5 hash is 32 chars in hex, but in base-62 it's only 22. If you're storing the hashes in a database, you can convert them to raw binary and save even more space (16 bytes for an MD5).

Since the resulting hash is just a string representation, you can just use substr if you only want a bit of it (as the function does).

Synchro
  • 35,538
  • 15
  • 81
  • 104
-1

You could use a slightly modified Base 64 with - and _ instead of + and /:

function base64_url_encode($str) {
    return strtr(base64_encode($str), array('+'=>'-', '/'=>'_'));
}
function base64_url_decode($str) {
    return base64_decode(strtr($str, array('-'=>'+', '_'=>'/')));
}

Additionally you could remove the trailing padding = characters.

And to get the raw MD5 value (binary string), set the second parameter (named $raw_output in the manual) to true:

$raw_md5 = md5($str, true);
Gumbo
  • 643,351
  • 109
  • 780
  • 844
  • check this http://stackoverflow.com/questions/352434/base-conversion-of-arbitrary-sized-numbers-php/1743486#1743486 – Alix Axel Dec 10 '09 at 11:47