Is it possible to figure how to decompress a file, knowing its first bytes?

Question

I once downloaded a web page using curl, and the resulting file contains the compressed HTML code. I would like to decompress it.

I tried this Python code

import gzip
f = gzip.open(file_name, 'rb')
file_content = f.read()
f.close()

which results in the following error: gzip.BadGzipFile: Not a gzipped file (b'\x1f\xc2').

\x1f and \xc2 are the first two bytes of the file. That is confirmed by:

with open(file_name, "rb") as f :
    binary_file_content = f.read()
for i in range(12):
    print(binary_file_content[i], end=" ")

which prints the first few bytes of the file: 31 194 139 8 0 0 0 0 0 0 3 195 (where 31 and 194 are decimal values of previously seen 1F and C2).

Do the first bytes provide a hint as to which decompressing method should be used? (I made a few tests with zlib.decompress but that failed so far.)

Edit: The output of file myCompressedFile is data.

It's definitely *possible*; for instance, 7-Zip can attempt to unzip any file, regardless of its extension. Since it's open source you could theoretically see how it works and not have to re-invent the wheel. — Random Davis, Feb 03 '22 at 23:05
The file command doesn't seem to know what that is. It seems likely that it's a compressed file format, since there are a few that start with `1f`, such as compress and gzip. But I've never seen `1f c2`. — Mark Adler, Feb 04 '22 at 00:49
Since there is no standard that says all compressed file formats must start with some unique signature / sequence of bytes, I'd say no, it's not possible. The best you could do is check for one or more that do so they're handled appropriately. — martineau, Feb 04 '22 at 01:03

Is it possible to figure how to decompress a file, knowing its first bytes?

0 Answers0

Linked