8

So here's a strange one. I have this method to take a Base64-encoded deflated string and return the original data:

public static string Base64Decompress(string base64data)
{
    byte[] b = Convert.FromBase64String(base64data);
    using (var orig = new MemoryStream(b))
    {
        using (var inflate = new MemoryStream())
        {
            using (var ds = new DeflateStream(orig, CompressionMode.Decompress))
            {
                ds.CopyTo(inflate);
                return Encoding.ASCII.GetString(inflate.ToArray());
            }
        }
    }
}

This returns an empty string unless I add a second call to ds.CopyTo(inflate). (WTF?)

   ...
            using (var ds = new DeflateStream(orig, CompressionMode.Decompress))
            {
                ds.CopyTo(inflate);
                ds.CopyTo(inflate);
                return Encoding.ASCII.GetString(inflate.ToArray());
            }
   ...

(Flush/Close/Dispose on ds have no effect.)

Why does the DeflateStream copy 0 bytes on the first call? I've also tried looping with Read(), but it also returns zero on the first call, then works on the second.


Update: here's the method I'm using to compress data.
public static string Base64Compress(string data, Encoding enc)
{
    using (var ms = new MemoryStream())
    {
        using (var ds = new DeflateStream(ms, CompressionMode.Compress))
        {
            byte[] b = enc.GetBytes(data);
            ds.Write(b, 0, b.Length);
            ds.Flush();
            return Convert.ToBase64String(ms.ToArray());
        }
    }
}
josh3736
  • 139,160
  • 33
  • 216
  • 263
  • This is very interesting. What happens when you replace the first of the two `ds.CopyTo()` with a `ds.Read(...)`? The first `CopyTo()` triggers reading over the footer of the stream. `Read()` should do the same. Just wondering. – Pieter van Ginkel Nov 11 '10 at 20:39
  • Are you sure it's deflate compressed, and not gzip compressed ? And are you sure there's no other stuff infront of the deflate (or gzip?) data? – nos Nov 11 '10 at 20:40
  • @Pieter: a `.Read()` has the same effect -- it returns `0`, but causes the next call to `CopyTo()` to work. – josh3736 Nov 11 '10 at 20:43
  • @nos: Yep. I generated the data with DeflateStream. I also used an external tool to test the data generated by my Compress method and it had no complaints. I'll post the compression method as well. – josh3736 Nov 11 '10 at 20:45
  • I have seen this before if the last block of the compression stream was not written out fully (i.e., incomplete); the first call to read/copy to will fail and subsequent calls will access the data. I will see if I can dig up some reference material... – Chris Baxter Nov 11 '10 at 20:46
  • The DeflateStream must be closed to write the final block; see updated answer. – Chris Baxter Nov 11 '10 at 21:00
  • @josh3736: I face same problem. After Copyto of input file stream into DeflateCompress the memory stream size is 0kb if the input file size is less then 100kb. – Saroop Trivedi Jun 04 '12 at 12:16

1 Answers1

7

This happens when the compressed bytes are incomplete (i.e., not all blocks are written out).

If I use your Base64Compress with the following Decompress method I will get an InvalidDataException with the message 'Unknown block type. Stream might be corrupted.'

Decompress

public static string Decompress(Byte[] bytes)
{
  using (var uncompressed = new MemoryStream())
  using (var compressed = new MemoryStream(bytes))
  using (var ds = new DeflateStream(compressed, CompressionMode.Decompress))
  {
    ds.CopyTo(uncompressed);
    return Encoding.ASCII.GetString(uncompressed.ToArray());
  }
}

Note that everything works as expected when using the following Compress method

public Byte[] Compress(Byte[] bytes)
{
  using (var memoryStream = new MemoryStream())
  {
    using (var deflateStream = new DeflateStream(memoryStream, CompressionMode.Compress))
      deflateStream.Write(bytes, 0, bytes.Length);

    return memoryStream.ToArray();
  }
}

Update

Oops, foolish me... you cannot ToArray the memory stream until you dispose the DeflateStream (as flush is acutally not implemented (and Deflate/GZip compress blocks of data); the final block is only written on close/dispose.

Re-write compress as:

public static string Base64Compress(string data, Encoding enc)
{
  using (var ms = new MemoryStream())
  {
    using (var ds = new DeflateStream(ms, CompressionMode.Compress))
    {
      byte[] b = enc.GetBytes(data);
      ds.Write(b, 0, b.Length);
    }

    return Convert.ToBase64String(ms.ToArray());
  }
} 
Chris Baxter
  • 16,083
  • 9
  • 51
  • 72
  • Yes, that's the problem. Technically you should use the DeflateStream() overload that takes the leaveOpen argument and pass *true*. Without it, closing/disposing the DeflateStream will also dispose the MemoryStream. That this doesn't cause a problem right now is an accident. – Hans Passant Nov 11 '10 at 21:36
  • @Hans, definitely not a bad idea, although Disposing a MemoryStream does not actually clear the buffer; rather it only prevents any further reads/writes from taking place on the MemoryStream. So technically there is a duplicate dispose on the MemoryStream, the bytes accessed via ToArray is still accessible regardless. – Chris Baxter Nov 11 '10 at 21:52
  • Yup. I've got a massive amount of cr*p at SO for pointing out that disposing a MemoryStream is silly. Glad to give some of it back :) – Hans Passant Nov 11 '10 at 22:50