0

I'm trying to read a very large gzipped csv file in node.js. So far, I've been using zlib for this:

file.createReadStream().pipe(zlib.createGunzip()

is the stream I pass to Papa.parse. This works fine for most files, but it fails with a very large gzipped CSV file (250 MB, unzips to 1.2 GB), throwing this error:

Error: incorrect header check
     at Zlib.zlibOnError [as onerror] (zlib.js:180:17) {
   errno: -3,
   code: 'Z_DATA_ERROR'
 }

Originally I thought it was the size of the file that caused the error, but now I'm not so sure; maybe it's because the file has been encrypted using a different algorithm. zlib.error: Error -3 while decompressing: incorrect header check suggests passing either -zlib.Z_MAX_WINDOWBITS or zlib.Z_MAX_WINDOWBITS|16 to correct for that, but I tried it and that's not the problem.

mcv
  • 4,217
  • 6
  • 34
  • 40
  • Provide the first 20 bytes or so in hex in the question so we can see if it really is a gzip stream. – Mark Adler Jan 07 '21 at 17:20
  • Turns out it wasn't a gzip stream indeed. There was json file with the same name containing metadata about our gzip file, and we accidentally didn't specify the extension. It was pure luck we got the correct file until recently, and only the last couple of days we received the json file instead. Simply specifying the extension fixed the problem. – mcv Jan 07 '21 at 18:34

1 Answers1

1

Despite being absolutely sure we had a gzip stream, it turns out we didn't. We got this file from an AWS S3 bucket which contained a lot of versions of this file with different time stamps. For that reason, we selected files based on prefix and loaded only the most recent one.

However, the S3 bucket also contained json files with metadata about these files. It was pure luck that for so long we always got the gzip instead of the json, and recently that luck faltered. So where we always got a gzip file, this time we got a json instead.

The header check error was entirely correct: the file we were looking at was not the gzip file we thought we had, so it didn't have the proper header.

Leaving this answer here instead of removing the question because it's always possible that someone in the future running into this error is absolutely sure they're gunzipping the correct file when they're actually not. Double check which file you're loading.

mcv
  • 4,217
  • 6
  • 34
  • 40