1

I have gzipped files on disk that I wish to stream to an HTTP client uncompressed. To do this I need to send a length header, then stream the uncompressed file to the client. I know the gzip protocol stores the original length of the uncompressed data, but as far as I can tell golang's "compress/gzip" package does not appear to have a way to grab this length. I've resorted to reading the file into a variable then taking the string length from that, but this is grossly inefficient and wasteful of memory especially on larger files.

Bellow I've posted the code I've ended up using:

DownloadHandler(w http.ResponseWriter, r *http.Request) {
path := "/path/to/thefile.gz";
openfile, err := os.Open(path);
if err != nil {
    w.WriteHeader(http.StatusNotFound);
    fmt.Fprint(w, "404");
    return;
}

defer openfile.Close();

fz, err := gzip.NewReader(openfile);
if err != nil {
    w.WriteHeader(http.StatusNotFound);
    fmt.Fprint(w, "404");
    return;
}

defer fz.Close()

// Wastefully read data into a string so I can get the length.
s, err := ioutil.ReadAll(fz);
r := strings.NewReader(string(s));

//Send the headers
w.Header().Set("Content-Disposition", "attachment; filename=test");
w.Header().Set("Content-Length", strconv.Itoa(len(s))); // Send length to client.
w.Header().Set("Content-Type", "text/csv");

io.Copy(w, r) //'Copy' the file to the client
}

What I would expect to be able to do instead is something like this:

DownloadHandler(w http.ResponseWriter, r *http.Request) {
path := "/path/to/thefile.gz";
openfile, err := os.Open(path);
if err != nil {
    w.WriteHeader(http.StatusNotFound);
    fmt.Fprint(w, "404");
    return;
}

defer openfile.Close();

fz, err := gzip.NewReader(openfile);
if err != nil {
    w.WriteHeader(http.StatusNotFound);
    fmt.Fprint(w, "404");
    return;
}

defer fz.Close()

//Send the headers
w.Header().Set("Content-Disposition", "attachment; filename=test");
w.Header().Set("Content-Length", strconv.Itoa(fz.Length())); // Send length to client.
w.Header().Set("Content-Type", "text/csv");

io.Copy(w, fz) //'Copy' the file to the client
}

Does anyone know how to get the uncompressed length for a gzipped file in golang?

Bravo Delta
  • 842
  • 2
  • 10
  • 24
  • You get the uncompressed length by uncompressing it. Why not use chunked encoding? – JimB Mar 20 '21 at 19:59
  • Why not use `Content-Encoding: gzip` and send the compressed file (and the length of the compressed file in the `Content-Length` header) ? – Erwin Bolwidt Mar 20 '21 at 22:28
  • @ErwinBolwidt - Doesn't that require the client to specifically allow or do all clients support that? – Bravo Delta Mar 21 '21 at 00:00
  • Depends on the client. Browsers have supported it for a decade. https://webmasters.stackexchange.com/questions/22217/which-browsers-handle-content-encoding-gzip-and-which-of-them-has-any-special – Erwin Bolwidt Mar 21 '21 at 01:09
  • The client will respond with what it supports. – Mark Adler Mar 21 '21 at 18:57
  • There are many other things you could do to improve things, depending on your needs. You could cache the length in a map in memory. You could encode it in the file name. You could cache the length on disk in a file `.meta`. Or a combination of these. Or you could decide, depending on your situation, "disk is cheap, CPU is more expensive" and just uncompress the files on disk. – Erwin Bolwidt Mar 21 '21 at 22:50

1 Answers1

6

The gzip format might appear to provide the uncompressed length, but actually it does not. Unfortunately, the only reliable way to get the uncompressed length is to decompress the gzip stream. (You can just count the bytes, not saving the uncompressed data anywhere.)

See this answer for why.

Mark Adler
  • 101,978
  • 13
  • 118
  • 158
  • Well at least that's an answer, not the desired answer, but an answer. – Bravo Delta Mar 20 '21 at 20:04
  • No. I think you'll have to do your wasteful thing. Which is only wasteful of memory. If memory is a problem, e.g. these are really big, then you can instead be wasteful in CPU by reading the gzip file twice. Once to count bytes, and the second time to send it out. – Mark Adler Mar 20 '21 at 21:36
  • For my purposes (i.e. tiny gzips containing the contents of a single file), your 'read the last 4 bytes little-endian' trick in works fine. Thank you! – Jason Stewart Mar 01 '23 at 11:09
  • It's not that it's a "single file". All gzip files contain a single file. It's whether that single file was compressed to a single gzip _member_, or multiple gzip members. – Mark Adler Mar 01 '23 at 18:17