0

I'm using GZipStream to compress a string, and I've modified two different examples to see what works. The first code snippet, which is a heavily modified version of the example in the documentation, simply returns an empty string.

public static String CompressStringGzip(String uncompressed)
{
    String compressedString;
    // Convert the uncompressed source string to a stream stored in memory
    // and create the MemoryStream that will hold the compressed string
    using (MemoryStream inStream = new MemoryStream(Encoding.Unicode.GetBytes(uncompressed)),
                        outStream = new MemoryStream())
    {
        using (GZipStream compress = new GZipStream(outStream, CompressionMode.Compress))
        {
            inStream.CopyTo(compress);
            StreamReader reader = new StreamReader(outStream);
            compressedString = reader.ReadToEnd();
        }
    }
    return compressedString;

and when I debug it, all I can tell is nothing is read from reader, which is compressedString is empty. However, the second method I wrote, modified from a CodeProject snippet is successful.

public static String CompressStringGzip3(String uncompressed)
{
    //Transform string to byte array
    String compressedString;
    byte[] uncompressedByteArray = Encoding.Unicode.GetBytes(uncompressed);

    using (MemoryStream outStream = new MemoryStream())
    {
        using (GZipStream compress = new GZipStream(outStream, CompressionMode.Compress))
        {
            compress.Write(uncompressedByteArray, 0, uncompressedByteArray.Length);
            compress.Close();
        }
        byte[] compressedByteArray = outStream.ToArray();
        StringBuilder compressedStringBuilder = new StringBuilder(compressedByteArray.Length);
        foreach (byte b in compressedByteArray)
            compressedStringBuilder.Append((char)b);
        compressedString = compressedStringBuilder.ToString();
    }
    return compressedString;
}

Why is the first code snippet not successful while the other one is? Even though they're slightly different, I don't know why the minor changes in the second snippet allow it to work. The sample string I'm using is SELECT * FROM foods f WHERE f.name = 'chicken';

Ricardo Altamirano
  • 14,650
  • 21
  • 72
  • 105
  • Anything to do with the position of the stream? Have you tried seeking the start of the stream in method 1 before reading it? – Charleh Jun 25 '12 at 15:36
  • I added `inStream.Seek(0L, SeekOrigin.Begin);` before the line: `inStream.CopyTo(compress);`, but the method still returns an empty string. – Ricardo Altamirano Jun 25 '12 at 15:46

2 Answers2

2

I ended up using the following code for compression and decompression:

public static String Compress(String decompressed)
{
    byte[] data = Encoding.UTF8.GetBytes(decompressed);
    using (var input = new MemoryStream(data))
    using (var output = new MemoryStream())
    {
        using (var gzip = new GZipStream(output, CompressionMode.Compress, true))
        {
            input.CopyTo(gzip);
        }
        return Convert.ToBase64String(output.ToArray());
    }
}

public static String Decompress(String compressed)
{
    byte[] data = Convert.FromBase64String(compressed);
    using (MemoryStream input = new MemoryStream(data))
    using (GZipStream gzip = new GZipStream(input, CompressionMode.Decompress))
    using (MemoryStream output = new MemoryStream())
    {
        gzip.CopyTo(output);
        StringBuilder sb = new StringBuilder();
        return Encoding.UTF8.GetString(output.ToArray());

    }
}

The explanation for a part of the problem comes from this question. Although I fixed the problem by changing the code to what I included in this answer, these lines (in my original code):

foreach (byte b in compressedByteArray)
            compressedStringBuilder.Append((char)b);

are problematic, because as dlev aptly phrased it:

You are interpreting each byte as its own character, when in fact that is not the case. Instead, you need the line:

string decoded = Encoding.Unicode.GetString(compressedByteArray);

The basic problem is that you are converting to a byte array based on an encoding, but then ignoring that encoding when you retrieve the bytes.

Therefore, the problem is solved, and the new code I'm using is much more succinct than my original code.

Community
  • 1
  • 1
Ricardo Altamirano
  • 14,650
  • 21
  • 72
  • 105
0

You need to move the code below outside the second using statement:

using (GZipStream compress = new GZipStream(outStream, CompressionMode.Compress)) 
{ 
    inStream.CopyTo(compress); 
    outStream.Position = 0;
    StreamReader reader = new StreamReader(outStream); 
    compressedString = reader.ReadToEnd(); 
}

CopyTo() is not flushing the results to the underlying MemoryStream.

Update

Seems that GZipStream closes and disposes it's underlying stream when it is disposed (not the way I would have designed the class). I've updated the sample above and tested it.

Slugart
  • 4,535
  • 24
  • 32
  • Outside the second (inner?) `using` statement? I get an error saying that `outStream` was unreadable if I move the code in question to there. – Ricardo Altamirano Jun 25 '12 at 15:45
  • How about just flushing the stream manually using stream.Flush()? – Charleh Jun 25 '12 at 15:51
  • @Charleh also works but I think it's more readable to do it outside the using statement as it separates the two tasks clearly. – Slugart Jun 25 '12 at 15:52
  • Yeah true, was mostly aiming at the issue with outStream being unreadable - not sure why this would be though since there is nothing disposing the stream until the end of the outer using statement – Charleh Jun 25 '12 at 15:54
  • @Slugart That code gives me an error: `Cannot access a closed Stream` – Ricardo Altamirano Jun 25 '12 at 16:07