How does one go about reverse engineering an algorithm?

Question

I'm wondering how does one go about reversing an algorithm such as one for storing logins or pin codes.

Lets say I have an amount of data where:

7262627 -> ? -> 8172

5353773 -> ? -> 1132

etc. This is just an example. Or say a hex string that is tansformed into another.

&h8712 -> &h1283 or something like that.

How do I go about starting to figure out what that algorithm is? Where does one start?

Would you start trying different shifts, xors and hope something stands out? I'm sure there's a better way as this seems like stabbing in the dark.

Is it even practically possible to reverse engineer this kind of algorithm?

Sorry if this is a stupid question. Thanks for your help / pointers.

Why is this "not a real question"? It's not difficult to tell what is being asked here. The question is not ambiguous, vague, incomplete or rhetorical, although it possibly is overly-broad. It can be reasonably answered in its current form: especially since it's only asking how one would *start* cryptanalyzing a hash function. It doesn't take a textbook to answer. — Steve Jessop, Nov 13 '10 at 01:34

Steve Jessop · Answer 1 · 2010-11-13T02:06:06.330

There are a few things people try:

Get the source code, or disassemble an executable.
Guess, based on the hash functions other people use. For example, a hash consisting of 32 hex digits might well be one or more repetitions of MD5, and if you can get a single input/output pair then it is quite easy to confirm or refute this (although see "salt", below).
Statistically analyze a large number of pairs of inputs and outputs, looking for any kind of pattern or correlations, and relate those correlations to properties of known hash functions and/or possible operations that the designer of the system might have used. This is beyond the scope of a single technique, and into the realms of general cryptanalysis.
Ask the author. Secure systems don't usually rely on the secrecy of the hash algorithms they use (and don't usually stay secure long if they do). The examples you give are quite small, though, and secure hashing of passwords would always involve a salt, which yours apparently don't. So we might not be talking about the kind of system where the author is confident to do that.

In the case of a hash where the output is only 4 decimal digits, you can attack it simply by building a table of every possible 7 digit input, together with its hashed value. You can then reverse the table and you have your (one-to-many) de-hashing operation. You never need to know how the hash is actually calculated. How do you get the input/output pairs? Well, if an outsider can somehow specify a value to be hashed, and see the result, then you have what's called a "chosen plaintext", and an attack relying on that is a "chosen plaintext attack". So a 7 digit -> 4 digit hash would be very weak indeed if it was used in a way which allowed chosen plaintext attacks to generate a lot of input/output pairs. I realise that's just one example, but it's also just one example of a technique to reverse it.

Note that reverse engineering the hash, and actually reversing it, are two different things. You could figure out that I'm using SHA-256, but that wouldn't help you reverse it (i.e., given an output, work out the input value). Nobody knows how to fully reverse SHA-256, although of course there are always rainbow tables (see "salt", above) <conspiracy>At least nobody admits they do, so it's no use to you or me.</conspiracy>

score 3 · Answer 2 · answered Nov 13 '10 at 01:41

3

Probably, you can't. Suppose the transformation function is known, something like

function hash(text):
    return sha1("secret salt"+text)

But the "secret salt" is not known, and is cryptographically strong (a very large, random integer). You could never brute force the secret salt from even a very large number of plain-text, crypttext pairs.

In fact, if the precise hash function used was known to be one of two equally strong functions, you could never even get a good guess between which one was being used.

answered Nov 13 '10 at 01:41

SingleNegationElimination

151,563
33
264
304

+1, but your last claim is a bit off, I think. It's true of two perfectly strong hash functions, but a hash function could be a long way off from being practically reversible, and yet still be identifiable given a plausible amount of data, from statistical biases. Not sure what the current score is for SHA-1, in particular. It's trending towards being slightly weak in various ways. – Steve Jessop Nov 13 '10 at 02:01
@Steve Jessop: I'll certainly concede that much; As far as I know, there's no generalizable proof to show that a hash function has no mathematical weaknesses, but at the same time, there's also no generalizable way to detect and exploit them. Exploits hash functions at least as strong as SHA-1 probably always require specific knowledge of their weaknesses. – SingleNegationElimination Nov 13 '10 at 02:44
yes, the way you'd do it (or rather, the crypto community does it) is basically to run every statistical test you can think of on every hash you know. If a given hash shows a bias, then that bias can be used to tentatively identify it from output. So definitely you're using specific knowledge of that hash, or at least of a class of hashes including that hash. To distinguish your "two equally strong functions", you'd need to know of a particular bias in one of them that the other can be assumed not to share. – Steve Jessop Nov 13 '10 at 02:49

SimonJ · Answer 3 · 2010-11-13T02:31:09.833

Stabbing in the dark will drive you to insanity. There are some algorithms that, given current understanding, you couldn't hope to deduce the inner workings of between now and the [predicted] end of the universe without knowing the exact details (potentially including private keys or internal state). Of course, some of these algorithms are the foundations of modern cryptography.

If you know in advance that there's a pattern to be discovered though, there are sometimes ways of approaching this. For instance, if the dataset contains several input values that differ by 1, compare the corresponding output values:

7262627 -> 8172
7262628 -> 819
7262629 -> 1732
...
7262631 -> 3558

Here it's fairly clear (given a few minutes and a calculator) that when the input increases by 1, the output increases by 913 modulo 8266 (i.e. a simple linear congruential generator).

Differential cryptanalysis is a relatively modern technique used to analyse the strength of cryptographic block ciphers, relying on a similar but more complex idea for where the cipher algorithm is known, but it's assumed the private key isn't. Input blocks differing from each other by a single bit are considered and the effect of that bit is traced through the cipher to deduce how likely each output bit is to "flip" as a result.

Other ways of approaching this kind of problem would be to look at the extremes (maximum, minimum values), distribution (leading to frequency analysis), direction (do the numbers always increase? decrease?) and (if this is allowed) consider the context in which the data sets were found. For instance, some types of PIN codes always contain a repeated digit to make them easier to remember (I'm not saying a PIN code can necessarily be deduced from anything else - just that a repeated digit is one less digit to worry about!).

"It's fairly clear that when the input increases by 1, the output increases by 913 modulo 8266" - that's exactly what I was just about to say. Honest ;-) — Steve Jessop, Nov 13 '10 at 02:22

score 0 · Answer 4 · answered Nov 13 '10 at 01:39

0

Is it even practically possible to reverse engineer this kind of algorithm?

It is possible with a flawed algorithm and enough encrypted/unencrypted pairs, but a well designed algorithm can eliminate that possibility of doing it at all.

answered Nov 13 '10 at 01:39

John

3,904
7
31
43

How does one go about reverse engineering an algorithm?

4 Answers4

Linked

Related