Are there algorithms for putting a digest into the file being digested?

Question

In otherwords, are there algorithms or libraries, or is it even possible to have a hash/digest of a file contained in the file being hashed/digested. This would be handy for obvious reasons, such as built in digests of ISOs. I've tried googling things like "MD5 injection" and "digest in a file of a file." No luck (probably for good reason.)

Not sure if it is even mathematically possible. Seems you'd be able to roll through the file but then you'd have to brute the last bit (assuming the digest was the last thing in the file or object.)

Thanks, Chenz

score 4 · Answer 1 · edited Oct 07 '21 at 08:57

It is possible in a limited sense:

Non-cryptographically-secure hashes

You can do this with insecure hashes like the CRC family of checksums.

Maclean's `gzip` quine

Caspian Maclean created a gzip quine, which decompresses to itself. Since the Gzip format includes a CRC-32 checksum (see the spec here) of the uncompressed data, and the uncompressed data equals the file itself, this file contains its own hash. So it's possible, but Maclean doesn't specify the algorithm he used to generate it:

It's quite simple in theory, but the helper programs I used were on a hard disk that failed, and I haven't set up a new working linux system to run them on yet. Solving the checksum by hand in particular would be very tedious.

Cox's `gzip`, `tar.gz`, and ZIP quines

Russ Cox created 3 more quines in Gzip, tar.gz, and ZIP formats, and wrote up in detail how he created them in an excellent article. The article covers how he embedded the checksum: brute force—

The second obstacle is that zip archives (and gzip files) record a CRC32 checksum of the uncompressed data. Since the uncompressed data is the zip archive, the data being checksummed includes the checksum itself. So we need to find a value x such that writing x into the checksum field causes the file to checksum to x. Recursion strikes back.

The CRC32 checksum computation interprets the entire file as a big number and computes the remainder when you divide that number by a specific constant using a specific kind of division. We could go through the effort of setting up the appropriate equations and solving for x. But frankly, we've already solved one nasty recursive puzzle today, and enough is enough. There are only four billion possibilities for x: we can write a program to try each in turn, until it finds one that works.

He also provides the code that generated the files.

(See also Zip-file that contains nothing but itself?)

Cryptographically-secure digests

With a cryptographically-secure hash function, this shouldn't be possible without either breaking the hash function (particularly, a secure digest should make it "infeasible to generate a message that has a given hash"), or applying brute force.

But these hashes are much longer than 32 bits, precisely in order to deter that sort of attack. So you can write a brute-force algorithm to do this, but unless you're extremely lucky you shouldn't expect it to finish before the universe ends.

MD5 is broken, so it might be easier

The MD5 algorithm is seriously broken, and a chosen-prefix collision attack is already practical (as used in the Flame malware's forged certificate; see http://www.cwi.nl/news/2012/cwi-cryptanalist-discovers-new-cryptographic-attack-variant-in-flame-spy-malware, http://arstechnica.com/security/2012/06/flame-crypto-breakthrough/). I don't know of what you want having actually been done, but there's a good chance it's possible. It's probably an open research question.

For example, this could be done using a chosen-prefix preimage attack, choosing the prefix equal to the desired hash, so that the hash would be embedded in the file. A preimage attack is more difficult than collision attacks, but there has been some progress towards it. See Does any published research indicate that preimage attacks on MD5 are imminent?.

It might also be possible to find a fixed point for MD5; inserting a digest is essentially the same problem. For discussion, see md5sum a file that contain the sum itself?.

Related questions:

score 3 · Accepted Answer · answered Feb 03 '10 at 17:51

The only way to do this is if you define your file format so the hash only applies to the part of the file that doesn't contain the hash.

However, including the hash inside a file (like built into an ISO) defeats the whole security benefit of the hash. You need to get the hash from a different channel and compare it with your file.

score 0 · Answer 3 · answered Feb 03 '10 at 17:41

0

No, because that would mean that the hash would have to be a hash of itself, which is not possible.

answered Feb 03 '10 at 17:41

kb.

1,955
16
22

2

eh... possible, but not cost effective because you'd have to somehow brute it. – Crazy Chenz Feb 03 '10 at 17:49
i can't see how it is possible without relying on finding a collision, but maybe that's what you mean by brute forcing? to find a scenario where hash(data+hash) and hash(hash) gives the same output? – kb. Feb 04 '10 at 07:28
4

@kb: Yes, it is possible, just the way you have put it. for (x=0; x< 2**160; x++) {if SHA1(data || x) == x {print "Success" + x; break} } if (x==2**160) {print "Failure")} pretty silly, huh? – President James K. Polk Feb 05 '10 at 22:17
haha yeah ok, i stand corrected then. thanks for not downvoting. ^__^ – kb. Feb 06 '10 at 07:34
Haha, someone actually took the effort to downvote this two and a half years later. I'm impressed. – kb. Oct 01 '12 at 10:29

Are there algorithms for putting a digest into the file being digested?

3 Answers3

Non-cryptographically-secure hashes

Maclean's `gzip` quine

Cox's `gzip`, `tar.gz`, and ZIP quines

Cryptographically-secure digests

MD5 is broken, so it might be easier

Related questions:

Linked

Are there algorithms for putting a digest into the file being digested?

3 Answers3

Non-cryptographically-secure hashes

Maclean's gzip quine

Cox's gzip, tar.gz, and ZIP quines

Cryptographically-secure digests

MD5 is broken, so it might be easier

Related questions:

Linked

Maclean's `gzip` quine

Cox's `gzip`, `tar.gz`, and ZIP quines