1

My GZipStream will only decompress the first line of the file. Extracting the contents via 7-zip works as expected and gives me the entire file contents. It also extracts as expected using gunzip on cygwin and linux, so I expect this is O/S specific (Windows 7).

I'm not certain how to go about troubleshooting this, so any tips on that would help me a great deal. It sounds very similar to this, but using SharpZLib results in the same thing.

Here's what I'm doing:

var inputFile = String.Format(@"{0}\{1}", inputDir, fileName);
var outputFile = String.Format(@"{0}\{1}.gz", inputDir, fileName);
var dcmpFile = String.Format(@"{0}\{1}", outputDir, fileName);

    using (var input = File.OpenRead(inputFile)) 
    using (var fileOutput = File.Open(outputFile, FileMode.Append))
    using (GZipStream gzOutput = new GZipStream(fileOutput, CompressionMode.Compress, true))
    {
        input.CopyTo(gzOutput);
    }

// Now, decompress
using (FileStream of = new FileStream(outputFile, FileMode.Open, FileAccess.Read))
using (GZipStream ogz = new GZipStream(of, CompressionMode.Decompress, false))
using (FileStream wf = new FileStream(dcmpFile, FileMode.Append, FileAccess.Write))
{
    ogz.CopyTo(wf); 
}
Community
  • 1
  • 1
duckus
  • 213
  • 3
  • 15
  • In case this is relevant, I should also add that when I'm building the file, I'm using Environment.NewLine() to delimit each line. – duckus Jun 26 '12 at 19:26

1 Answers1

4

Your output file only contains a single line (gzipped) - but it contains all of the text data other than the line breaks.

You're repeatedly calling ReadLine() which returns a line of text without the line break and converting that text to bytes. So if you had an input file which had:

abc
def
ghi

You'd end up with an output file which was the compressed version of

abcdefghi

If you don't want that behaviour, why even go through a StreamReader in the first place? Just copy from the input FileStream straight to the GZipStream a block at a time, or use Stream.CopyTo if you're using .NET 4:

// Note how much simpler the code is using File.*
using (var input = File.OpenRead(inputFile))
using (var fileOutput = File.Open(outputFile, FileMode.Append))
using (GZipStream gzOutput = new GZipStream(os, CompressionMode.Compress, true)) 
{
    input.CopyTo(gzOutput);
}

Also note that appending to a compressed file is rarely a good idea, unless you've got some sort of special handling for multiple "chunks" within a single file.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • It doesn't appear to be the case that the entire contents of the file are compressed into a single line. The problem file has only 630 characters, and the file contents are the same as the first line of the file that decompresses correctly. I will try the shortened snippet you suggest, though. – duckus Jun 26 '12 at 19:54
  • @duckus: That doesn't tie in with what I've seen. I just tried your code with a test file, and it behaved exactly as I expected. I don't suppose you've got a U+0000 character in there which is freaking out the viewer for the decompressed file? – Jon Skeet Jun 26 '12 at 19:59
  • In my source file, I'm delimiting my columns using char.ConvertFromUtf32(1) -- for Control-A, and delimiting each line using Environment.NewLine(). I may try though, compressing the contents from a direct stream instead of loading them from the file. Just to see if the same thing happens. – duckus Jun 26 '12 at 20:08
  • @duckus: If you can post a sample file on the web somewhere (just with dummy data) that would help. – Jon Skeet Jun 26 '12 at 20:09
  • Ok ... I just added a sample: https://github.com/sf-billops/misc-projects/tree/master/Misc – duckus Jun 26 '12 at 20:22
  • @duckus: Again, running your code, I've got a 4,326,415 byte input file, and a 4,332,167 byte output file. Have you kept the `.gz` file between tests? Because the way you're *appending* to that instead of replacing it could well be confusing you. (Really, don't append!) – Jon Skeet Jun 26 '12 at 20:36