We have a bunch of CRC32 hashes which would be really nice to know the input of. Some of them are short enough that brute-forcing them is feasible; others are not. The CRC32 algorithm being used is equal to the one in Python's binascii
(its implementation is spelled out at https://rosettacode.org/wiki/CRC-32#Python).
I've read these pages:
- http://www.danielvik.com/2010/10/calculating-reverse-crc.html
- http://www.danielvik.com/2013/07/rewinding-crc-calculating-crc-backwards.html
...and it seems to me that there's something hidden somewhere that can reduce the amount of effect to reverse these things that I just can't puzzle out.
The main reason I think we can do better than full brute force is that we know two things about the hash inputs:
- We know the length of every input. I believe this means that as soon as we find a guess that equals length and has a reversed CRC value of 0, that must be correct (might not help much though). Maybe there's some other property of length caused by the algorithm that could cut down on effort that I don't see.
- We also know that every input has a restricted character set of
[A-Za-z_0-9]
(only letters, numbers, and underscores). In addition, we know that numbers are rare, andA-Z
do not appear to ever be mixed witha-z
, so we can often get away with just[a-z_]
in 90% of cases.- Also, the majority of these are snake_case English words/phrases (e.g. "weight" or "turn_around"). So we can also filter out any guesses that contain "qxp" or such.
Since the above links discuss how you can add any 4 chars to an input and leave the hash unchanged, one idea I thought of is to brute force the last 4 chars (most of which are invalid because they're not in the restricted charset), filter out the ones that are clearly not right (because of illegal 3-letter English combos and such), and come up with a "short"list of potentially valid last 4 chars. Then we repeat until the whole thing has been whittled down, which should be faster than pure brute force. But I can't find the leap in logic to figure out how.
If I'm going off in the wrong direction to solve this, that would be good to know as well.