0

I am using the md5 function to hash a string into a 32 digit string.

str_to_encode = 'this is a test string which I want to encode'
encoded = hashlib.md5(str_to_encode).hexdigest() 

I want to be able to decode this string (i.e. encoded in the example above) back to its original value. I don't think this is possible using md5 (but if it is please let me know), but is there a compressionn function which I can use which will give me a 32 digit string at the end but which can be reverted?

EDIT: The string being encoded is a url so will only be a couple of hundred characters max although in most cases it will be a lot less.

Thanks

John
  • 21,047
  • 43
  • 114
  • 155
  • 2
    What is going on? This is like the fourth person who's asked about "decoding md5" in the last day or so – Michael Mrozek Jun 28 '10 at 14:05
  • Ha, I'm not that interest in decoding it as I'm pretty certain that it can't be done. What I'm really interest in is a function which will encode a string into a 32 length string which can then be converted/decoded back into its original format. Originally I didn't need to decode it and just needed it to be 32 length and that's why I used md5 but I now need to decode it as well. – John Jun 28 '10 at 14:09
  • 2
    MD5 is a hashing function. It cannot be used to compress and decompress a string. – liviucmg Jun 28 '10 at 14:17
  • Why not store the mapping in a database or file, and forego compression altogether? – wump Jun 29 '10 at 15:21

2 Answers2

4

You seem to want two things that can't coexist:

  • Any string of any length is converted to exactly 32-bytes, even if it started as 4gb
  • The encoded string is decodable without loss of information

There's only so many bits in an MD5 hash, so by the pigeonhole principle it's impossible to reverse it. If it were reversible you could use a hash to compress information infinitely. Furthermore, irreversibility is the main point of a hash; they're intended to be one-way functions. Encryption algorithms are reversible, but require more bytes to store the ciphertext since decodability means they must be collision-free (two plaintexts can't encode to the same ciphertext, or the decode function wouldn't know which plaintext to output given that ciphertext)

Michael Mrozek
  • 169,610
  • 28
  • 168
  • 175
  • The sting is a url so will never be that long. I don't know what the max length for a url is but can't see it being more than a few hundred characters if that helps. – John Jun 28 '10 at 14:13
  • Even so, you're not going to fit a couple hundred characters in a 32 (byte) length string. A 32-character UTF-32 string, maybe. – Tim Pietzcker Jun 28 '10 at 14:16
  • @John Technically there [is no limit](http://stackoverflow.com/questions/417142/what-is-the-maximum-length-of-an-url), but practically insanely long URLs are rare. It sounds like you're looking for a compression algorithm then, rather than encryption or hashing -- is security a concern? I doubt there are any compression algorithms that are efficient enough, but I've not actually tried – Michael Mrozek Jun 28 '10 at 14:16
  • security is not a problem. Like you said a compression algorithm is what I'm after. I used encode as I wasn't sure the correct term to use. – John Jun 28 '10 at 14:20
4

It seems to me that you aren't looking for a hash or encryption, you are looking for compression. Try zlib and base64 encoding:

s = 'Hello, world'
encoded = zlib.compress(s).encode('base64')

The length of the encoded data will grow as the input grows, but it may work for you.

Even restricting yourself to URLs, there's no way to reversibly map them to 32-character strings, there are just too many possible URLs.

Ned Batchelder
  • 364,293
  • 75
  • 561
  • 662