3

Disclaimer: I understand that a hash is not supposed to be reversible.

I've seen many people ask if there is a way to "unhash" text that is already hashed. However, I am not seeing a straight answer. Most answers state that MD5 and SHA-1 are one-way hashing algorthims, and therefore irreversible. That's great and all, but it begs the question are all hashing algorithms one-way and irreversible?

Chris
  • 5,571
  • 2
  • 20
  • 32
Sara Fuerst
  • 5,688
  • 8
  • 43
  • 86
  • Possible answer: http://stackoverflow.com/questions/9262109/php-simplest-two-way-encryption – Felippe Duarte May 16 '16 at 20:37
  • Just google it. There are lookup tables for common hashing algorithms. – wogsland May 16 '16 at 20:38
  • 6
    reversible hashing is called `encryption` – Iłya Bursov May 16 '16 at 20:39
  • 3
    A "hash" is by definition supposed to be irreversible: https://en.wikipedia.org/wiki/Cryptographic_hash_function. – WillardSolutions May 16 '16 at 20:39
  • 1
    @EatPeanutButter Please read the first line of the question. . . – Sara Fuerst May 16 '16 at 20:41
  • 4
    @tibsar Even though you say you understand what a hash is, the rest of your question asks about the very thing that makes a hash a hash. So your disclaimer should answer your own question, right? – WillardSolutions May 16 '16 at 20:43
  • @EatPeanutButter then you clearly know the answer to the question, so write one. – Sara Fuerst May 16 '16 at 20:44
  • @EatPeanutButter Also, just because something is not *intended* to be reversible, does not mean that it actually isn't in practice – Sara Fuerst May 16 '16 at 20:47
  • @tibsar the only "reversible" hashing "in practice" uses lookup tables to compare hashes with known inputs. – WillardSolutions May 16 '16 at 20:55
  • @EatPeanutButter, then if it's "reversible" is it really a hash? It seems many people are saying that hash by definition is irreversible, yet others are saying that "Reversible hashing is... encryption". So which is correct? Is it true that any "reversible" hash can no longer be classified as a hash simply because it is reversible? – Sara Fuerst May 16 '16 at 20:58
  • 1
    @tibsar the main difference here - how long it will take to "reverse", in case of encryption algorithms, decryption/reversing takes reasonable time (comparable with time spent encrypting data), in case of hash - time to hash is much much lower. For example to generate hash you need several microseconds, to decrypt it you may need centuries on modern computer to generate and test all possible variants – Iłya Bursov May 16 '16 at 21:18

3 Answers3

5

A hash function is any function that can be used to map data of arbitrary size to data of fixed size. (source: Wikipedia)

Because the range of the input values is infinite and the number of possible distinct output values is finite, the function produces the same output for an infinite number of input values. This means a hash is a losing-information function.

Assuming one could "reverse" the hashing, they would get an infinite set of possible original values. It is still impossible to tell what was the value used to generate the hash.

In mathematical terms, a hash function is not injective and this property automatically makes it not invertible.

All of the above apply to any hash function, no matter what language or library provides it.

axiac
  • 68,258
  • 9
  • 99
  • 134
3

Not really. The one absolutely non-negotiable property of a hash function is it converts data of an arbitrary length to values of a fixed length. This means each possible result of your hashing function has infinitely many possible inputs that could produce it, making reversing the hash function to a single value impossible.

If you can place constraints on the length of your data input, then technically you could define a reversible hash function but I don't particularly see a use for it.

Chris
  • 5,571
  • 2
  • 20
  • 32
2

... are all hashing algorithms one-way and irreversible?

There are some real-world hash functions that can be reversed, such as the not-uncommon implementation of nominally hashing an 8, 16, 32 or 64-bit number by returning the input unchanged. Many C++ Standard Libraries, python and other languages do exactly that, as it's often good enough for use by hash tables keyed on the numbers - the extra potential for collisions must be weighed up against the time that would have been needed to generate a stronger hash, and indeed even the potential CPU-cache benefits of nearby keys hashing to nearby buckets.

That said, your question starts...

I've seen many people ask if there is a way to "unhash" text that is already hashed.

For very short amounts of text, such 8-character passwords, brute force attacks using dictionaries and mutation rules (e.g. "try a dictionary word followed by each character from space (ASCII 32) through tilda (127)", "try all combinations of replacing letters with similar-looking or -sounding numbers"...) can sometimes find the password likely used (though there's a small chance it's another password with the same hash value).

If the input wasn't based on a dictionary word or something else guessable, it's far less likely to be crackable.

For longer amounts of text, it's increasingly impractical to find any input with matching hash value, and massively less likely that any such input would actually be the one originally used to generate the hash (with longer inputs, more of them will - on average - map to any given hash value). Once the text input is dozens of times longer than the hash value, it's totally impractical (unless perhaps quantum computing develops significantly). (Note that Microsoft's C++ compiler's std::hash<std::string> only combines 10 characters evenly spaced along any string to form the hash value, so longer strings don't increase the quality of the hash, but on the other hand the hash only provides any insight at all into the max 10 characters chosen to form it).

Most answers state that MD5 and SHA-1 are one-way hashing algorthims, and therefore irreversible.

Hashes suitable for cryptographic use (as distinct from hash table use) - should inherently take a relatively long time to calculate (some goodly fraction of a second on likely hardware), so that the brute-force dictionary attacks mentioned above are prohibitively compute-intensive even for short textual strings. This helps make them practically irreversible. Even reasonable checksum-strength hash functions will be hard to reverse after there are more bytes of input than there are bytes in the hash value, rapidly becoming practically irreversible as the input gets larger and larger.

Tony Delroy
  • 102,968
  • 15
  • 177
  • 252