It is possible in a limited sense:
Non-cryptographically-secure hashes
You can do this with insecure hashes like the CRC family of checksums.
Maclean's gzip
quine
Caspian Maclean created a gzip
quine, which decompresses to itself. Since the Gzip format includes a CRC-32 checksum (see the spec here) of the uncompressed data, and the uncompressed data equals the file itself, this file contains its own hash. So it's possible, but Maclean doesn't specify the algorithm he used to generate it:
It's quite simple in theory, but the helper programs I used were on a hard disk that failed, and I haven't set up a new working linux system to run them on yet. Solving the checksum by hand in particular would be very tedious.
Cox's gzip
, tar.gz
, and ZIP quines
Russ Cox created 3 more quines in Gzip, tar.gz
, and ZIP formats, and wrote up in detail how he created them in an excellent article. The article covers how he embedded the checksum: brute force—
The second obstacle is that zip archives (and gzip files) record a CRC32 checksum of the uncompressed data. Since the uncompressed data is the zip archive, the data being checksummed includes the checksum itself. So we need to find a value x such that writing x into the checksum field causes the file to checksum to x. Recursion strikes back.
The CRC32 checksum computation interprets the entire file as a big number and computes the remainder when you divide that number by a specific constant using a specific kind of division. We could go through the effort of setting up the appropriate equations and solving for x. But frankly, we've already solved one nasty recursive puzzle today, and enough is enough. There are only four billion possibilities for x: we can write a program to try each in turn, until it finds one that works.
He also provides the code that generated the files.
(See also Zip-file that contains nothing but itself?)
Cryptographically-secure digests
With a cryptographically-secure hash function, this shouldn't be possible without either breaking the hash function (particularly, a secure digest should make it "infeasible to generate a message that has a given hash"), or applying brute force.
But these hashes are much longer than 32 bits, precisely in order to deter that sort of attack. So you can write a brute-force algorithm to do this, but unless you're extremely lucky you shouldn't expect it to finish before the universe ends.
MD5 is broken, so it might be easier
The MD5 algorithm is seriously broken, and a chosen-prefix collision attack is already practical (as used in the Flame malware's forged certificate; see http://www.cwi.nl/news/2012/cwi-cryptanalist-discovers-new-cryptographic-attack-variant-in-flame-spy-malware, http://arstechnica.com/security/2012/06/flame-crypto-breakthrough/). I don't know of what you want having actually been done, but there's a good chance it's possible. It's probably an open research question.
For example, this could be done using a chosen-prefix preimage attack, choosing the prefix equal to the desired hash, so that the hash would be embedded in the file. A
preimage attack is more difficult than collision attacks, but there has been some progress towards it. See Does any published research indicate that preimage attacks on MD5 are imminent?.
It might also be possible to find a fixed point for MD5; inserting a digest is essentially the same problem. For discussion, see md5sum a file that contain the sum itself?.
Related questions: