187

I am newbie in .net. I am doing compression and decompression string in C#. There is a XML and I am converting in string and after that I am doing compression and decompression.There is no compilation error in my code except when I decompression my code and return my string, its returning only half of the XML.

Below is my code, please correct me where I am wrong.

Code:

class Program
{
    public static string Zip(string value)
    {
        //Transform string into byte[]  
        byte[] byteArray = new byte[value.Length];
        int indexBA = 0;
        foreach (char item in value.ToCharArray())
        {
            byteArray[indexBA++] = (byte)item;
        }

        //Prepare for compress
        System.IO.MemoryStream ms = new System.IO.MemoryStream();
        System.IO.Compression.GZipStream sw = new System.IO.Compression.GZipStream(ms, System.IO.Compression.CompressionMode.Compress);

        //Compress
        sw.Write(byteArray, 0, byteArray.Length);
        //Close, DO NOT FLUSH cause bytes will go missing...
        sw.Close();

        //Transform byte[] zip data to string
        byteArray = ms.ToArray();
        System.Text.StringBuilder sB = new System.Text.StringBuilder(byteArray.Length);
        foreach (byte item in byteArray)
        {
            sB.Append((char)item);
        }
        ms.Close();
        sw.Dispose();
        ms.Dispose();
        return sB.ToString();
    }

    public static string UnZip(string value)
    {
        //Transform string into byte[]
        byte[] byteArray = new byte[value.Length];
        int indexBA = 0;
        foreach (char item in value.ToCharArray())
        {
            byteArray[indexBA++] = (byte)item;
        }

        //Prepare for decompress
        System.IO.MemoryStream ms = new System.IO.MemoryStream(byteArray);
        System.IO.Compression.GZipStream sr = new System.IO.Compression.GZipStream(ms,
            System.IO.Compression.CompressionMode.Decompress);

        //Reset variable to collect uncompressed result
        byteArray = new byte[byteArray.Length];

        //Decompress
        int rByte = sr.Read(byteArray, 0, byteArray.Length);

        //Transform byte[] unzip data to string
        System.Text.StringBuilder sB = new System.Text.StringBuilder(rByte);
        //Read the number of bytes GZipStream red and do not a for each bytes in
        //resultByteArray;
        for (int i = 0; i < rByte; i++)
        {
            sB.Append((char)byteArray[i]);
        }
        sr.Close();
        ms.Close();
        sr.Dispose();
        ms.Dispose();
        return sB.ToString();
    }

    static void Main(string[] args)
    {
        XDocument doc = XDocument.Load(@"D:\RSP.xml");
        string val = doc.ToString(SaveOptions.DisableFormatting);
        val = Zip(val);
        val = UnZip(val);
    }
} 

My XML size is 63KB.

shytikov
  • 9,155
  • 8
  • 56
  • 103
Mohit Kumar
  • 2,019
  • 2
  • 17
  • 14
  • 2
    I suspect the problem will "fix itself" if using [UTF8Encoding](http://msdn.microsoft.com/en-us/library/system.text.utf8encoding.aspx) (or UTF16 or whatnot) and GetBytes/GetString. It will also greatly simplify the code. Also recommend using `using`. –  Sep 08 '11 at 05:36
  • 1
    You can't convert char into byte and the reverse like you do (using a simple cast). You need to use an encoding, and the same encoding for compression/decompression. See xanatos answer below. – Simon Mourier Sep 08 '11 at 06:10
  • @pst no it won't; you would be using `Encoding` the wrong way around. You need base-64 here, as per xanatos' answer – Marc Gravell Sep 08 '11 at 06:18
  • @Marc Gravell True, missed that part of the signature/intent. Definitely not my first choice of signatures. –  Sep 08 '11 at 07:06

8 Answers8

324

The code to compress/decompress a string

public static void CopyTo(Stream src, Stream dest) {
    byte[] bytes = new byte[4096];

    int cnt;

    while ((cnt = src.Read(bytes, 0, bytes.Length)) != 0) {
        dest.Write(bytes, 0, cnt);
    }
}

public static byte[] Zip(string str) {
    var bytes = Encoding.UTF8.GetBytes(str);

    using (var msi = new MemoryStream(bytes))
    using (var mso = new MemoryStream()) {
        using (var gs = new GZipStream(mso, CompressionMode.Compress)) {
            //msi.CopyTo(gs);
            CopyTo(msi, gs);
        }

        return mso.ToArray();
    }
}

public static string Unzip(byte[] bytes) {
    using (var msi = new MemoryStream(bytes))
    using (var mso = new MemoryStream()) {
        using (var gs = new GZipStream(msi, CompressionMode.Decompress)) {
            //gs.CopyTo(mso);
            CopyTo(gs, mso);
        }

        return Encoding.UTF8.GetString(mso.ToArray());
    }
}

static void Main(string[] args) {
    byte[] r1 = Zip("StringStringStringStringStringStringStringStringStringStringStringStringStringString");
    string r2 = Unzip(r1);
}

Remember that Zip returns a byte[], while Unzip returns a string. If you want a string from Zip you can Base64 encode it (for example by using Convert.ToBase64String(r1)) (the result of Zip is VERY binary! It isn't something you can print to the screen or write directly in an XML)

The version suggested is for .NET 2.0, for .NET 4.0 use the MemoryStream.CopyTo.

IMPORTANT: The compressed contents cannot be written to the output stream until the GZipStream knows that it has all of the input (i.e., to effectively compress it needs all of the data). You need to make sure that you Dispose() of the GZipStream before inspecting the output stream (e.g., mso.ToArray()). This is done with the using() { } block above. Note that the GZipStream is the innermost block and the contents are accessed outside of it. The same goes for decompressing: Dispose() of the GZipStream before attempting to access the data.

bambams
  • 745
  • 1
  • 8
  • 18
xanatos
  • 109,618
  • 12
  • 197
  • 280
  • Thank you for the reply.When I use your code, it is giving me compilation error."CopyTo() doesn't have namespace or assembly reference.". After that I have searched on Google and fount it that CopyTo() part of .NET 4 Framework. But I am working on .net 2.0 and 3.5 framework. Please suggest me.:) – Mohit Kumar Sep 08 '11 at 07:19
  • I just want to emphasize that the GZipStream must be disposed before calling ToArray() on the output stream. I ignored that bit, but it makes a difference! – Wet Noodles Jan 23 '14 at 19:41
  • 1
    is this most effective way of zipping at .net 4.5 ? – Furkan Gözükara Sep 14 '14 at 14:06
  • 1
    Note that this fails (unzipped-string != original) in case of string containing surrogate pairs e.g. `string s = "X\uD800Y"`. I noticed that it works if we change the Encoding to UTF7... but with UTF7 are we sure all characters can be represented ? – digEmAll Jan 15 '15 at 11:17
  • @digEmAll I will say that it doesn't work if there are INVALID surrogate pairs (as is in your case). The UTF8 GetByes conversion silently replaces the invalid surrogate pair with 0xFFFD . – xanatos Jan 15 '15 at 12:13
  • this line "while ((cnt = src.Read(bytes, 0, bytes.Length)) != 0)" throws System.IO.InvalidDataException, saying 0 is magic number with incorrect value. – rluks Apr 11 '15 at 10:10
  • "The magic number in GZip header is not correct. Make sure you are passing in a GZip stream." – rluks Apr 11 '15 at 11:01
  • @Pan.student You receive the error while decompressig, right? How did you produce the compressed string? – xanatos Apr 12 '15 at 05:20
  • Yes, I have .gz archive which contains txt file. – rluks Apr 12 '15 at 08:34
  • 1
    @Pan.student I've checked and it seems to work with a gz file I produced. There is the possibility the file isn't really a gz file. Note that a gz file is not a rar file and is not a zip file and is not a bz2 file. They are all incompatible formats. If you are able to open it from Windows then I suggest you post a question on SO posting the code you are using. – xanatos Apr 13 '15 at 07:43
  • @xanatos can I send you my .gz file, so you can try it out? – rluks Apr 13 '15 at 08:32
  • @Pan.student It works correctly: `Unzip(File.ReadAllBytes("data.txt.gz"))`. You probably are reading the gz file in the wrong way. – xanatos Apr 14 '15 at 09:31
  • I would recommend instead of `Base64` encoding the result from `Zip` to hex encode it instead as `Base64` has an upper size limit of approximately 8k bytes. – Kraang Prime Jan 07 '17 at 11:54
  • byte[] bytes = new byte[4096] Why reading from the stream happens in 4KB chunks? Is it some kind of compromise between amount of iterations and amount of buffer memory used? – Mariusz Mar 28 '17 at 08:55
  • 1
    That code in the answer will result an exception of "The magic number in GZip header is not correct. Make sure you are passing in a GZip stream.", as @rluks said. – Eriawan Kusumawardhono Aug 28 '17 at 08:00
  • @EriawanKusumawardhono The problem of rluks was resolved... Probably you too have a non-gz file, or a corrupted file. – xanatos Aug 28 '17 at 10:10
  • @xanatos by default GZipStream need the 4 bytes as magic header. having simple CopyTo is not enough for, because it will be broken if the GZipStream is copied to other non-file stream such as FileStream. – Eriawan Kusumawardhono Aug 29 '17 at 02:17
  • @EriawanKusumawardhono The `GZipStream` correctly prepends the header. All the classes in .NET that implement the `Stream` base class are binary, so don't "corrupt" the data and the content can be copied without any problem. Note that `StreamReader` and `StreamWriter` aren't "`Stream`" (they are `TextReader`). But this is mooth because the example uses `MemoryStream`. – xanatos Aug 29 '17 at 05:30
  • A more elegant solution here: https://stackoverflow.com/a/64582157/2284031 – Ben Wilde Oct 28 '20 at 22:23
  • Your solution seems perfect. I added couple of handy functions to convert those bytes to string and vice versa. Otherwise Encoding.UTF8.GetBytes(str); was failing for some reason (I had no time to dig deeper). – Developer Sep 10 '21 at 04:58
138

according to this snippet i use this code and it's working fine:

using System;
using System.IO;
using System.IO.Compression;
using System.Text;

namespace CompressString
{
    internal static class StringCompressor
    {
        /// <summary>
        /// Compresses the string.
        /// </summary>
        /// <param name="text">The text.</param>
        /// <returns></returns>
        public static string CompressString(string text)
        {
            byte[] buffer = Encoding.UTF8.GetBytes(text);
            var memoryStream = new MemoryStream();
            using (var gZipStream = new GZipStream(memoryStream, CompressionMode.Compress, true))
            {
                gZipStream.Write(buffer, 0, buffer.Length);
            }

            memoryStream.Position = 0;

            var compressedData = new byte[memoryStream.Length];
            memoryStream.Read(compressedData, 0, compressedData.Length);

            var gZipBuffer = new byte[compressedData.Length + 4];
            Buffer.BlockCopy(compressedData, 0, gZipBuffer, 4, compressedData.Length);
            Buffer.BlockCopy(BitConverter.GetBytes(buffer.Length), 0, gZipBuffer, 0, 4);
            return Convert.ToBase64String(gZipBuffer);
        }

        /// <summary>
        /// Decompresses the string.
        /// </summary>
        /// <param name="compressedText">The compressed text.</param>
        /// <returns></returns>
        public static string DecompressString(string compressedText)
        {
            byte[] gZipBuffer = Convert.FromBase64String(compressedText);
            using (var memoryStream = new MemoryStream())
            {
                int dataLength = BitConverter.ToInt32(gZipBuffer, 0);
                memoryStream.Write(gZipBuffer, 4, gZipBuffer.Length - 4);

                var buffer = new byte[dataLength];

                memoryStream.Position = 0;
                using (var gZipStream = new GZipStream(memoryStream, CompressionMode.Decompress))
                {
                    gZipStream.Read(buffer, 0, buffer.Length);
                }

                return Encoding.UTF8.GetString(buffer);
            }
        }
    }
}
fubo
  • 44,811
  • 17
  • 103
  • 137
  • 3
    I just wanted to thank you for posting this code. I dropped it into my project and it worked right out of the box with no problems at all. – BoltBait Jul 27 '15 at 17:47
  • 4
    Yup working out of the box! I also liked the idea of adding length as first four bytes – JustADev Mar 21 '16 at 13:51
  • 2
    This is the best answer. This one should be marked as the answer! – Eriawan Kusumawardhono Aug 28 '17 at 07:59
  • Should I expect this code to give me a significant decrease in string length after compress? It's not, actually a little bigger when compressed. I'm taking a png and converting it to a string using System.Convert.ToBase64String(data), then compressing the result. – Matt Dec 04 '17 at 23:56
  • 1
    @Matt that's like zipping a .zip file - .png is already a compressed content – fubo Dec 05 '17 at 07:15
  • This also moves the gzip header bytes, 31 and 139, so it is harder for someone else to detected if the content is zipped. https://stackoverflow.com/questions/6059302/how-to-check-if-a-file-is-gzip-compressed http://www.zlib.org/rfc-gzip.html#header-trailer – kristian mo Jan 25 '18 at 09:07
  • 4
    The answer that is marked as answer is not stable. This one is the best answer. – NuminousName May 05 '18 at 05:43
  • 1
    Confirmed, this answer is the best for .NET Core 2.2. – Snympi Mar 02 '19 at 12:59
  • 1
    only thing missing is the using on var memoryStream = new MemoryStream(); – Walter Verhoeven Apr 15 '21 at 11:02
  • What is the use of "4" during the array copy? I m trying to use this and it's not working for me. It works only for smaller strings and for larger ones, after decompression, the result is corrupt. – Arun Prakash Nagendran Dec 06 '22 at 04:19
  • 1
    Tried to test this and the string comparison fails. await File.WriteAllTextAsync("testfile.txt", new StringBuilder().Insert(0, "ABCDE", 500000).ToString()); var directString = File.ReadAllText("testfile.txt"); var compressedDirect = CompressString(directString); var decompressedDirect = DecompressString(compressedDirect); Console.WriteLine(String.Compare(directString, decompressedDirect, true)); – Arun Prakash Nagendran Dec 06 '22 at 04:27
62

With the advent of .NET 4.0 (and higher) with the Stream.CopyTo() methods, I thought I would post an updated approach.

I also think the below version is useful as a clear example of a self-contained class for compressing regular strings to Base64 encoded strings, and vice versa:

public static class StringCompression
{
    /// <summary>
    /// Compresses a string and returns a deflate compressed, Base64 encoded string.
    /// </summary>
    /// <param name="uncompressedString">String to compress</param>
    public static string Compress(string uncompressedString)
    {
        byte[] compressedBytes;

        using (var uncompressedStream = new MemoryStream(Encoding.UTF8.GetBytes(uncompressedString)))
        {
            using (var compressedStream = new MemoryStream())
            { 
                // setting the leaveOpen parameter to true to ensure that compressedStream will not be closed when compressorStream is disposed
                // this allows compressorStream to close and flush its buffers to compressedStream and guarantees that compressedStream.ToArray() can be called afterward
                // although MSDN documentation states that ToArray() can be called on a closed MemoryStream, I don't want to rely on that very odd behavior should it ever change
                using (var compressorStream = new DeflateStream(compressedStream, CompressionLevel.Fastest, true))
                {
                    uncompressedStream.CopyTo(compressorStream);
                }

                // call compressedStream.ToArray() after the enclosing DeflateStream has closed and flushed its buffer to compressedStream
                compressedBytes = compressedStream.ToArray();
            }
        }

        return Convert.ToBase64String(compressedBytes);
    }

    /// <summary>
    /// Decompresses a deflate compressed, Base64 encoded string and returns an uncompressed string.
    /// </summary>
    /// <param name="compressedString">String to decompress.</param>
    public static string Decompress(string compressedString)
    {
        byte[] decompressedBytes;

        var compressedStream = new MemoryStream(Convert.FromBase64String(compressedString));

        using (var decompressorStream = new DeflateStream(compressedStream, CompressionMode.Decompress))
        {
            using (var decompressedStream = new MemoryStream())
            {
                decompressorStream.CopyTo(decompressedStream);

                decompressedBytes = decompressedStream.ToArray();
            }
        }

        return Encoding.UTF8.GetString(decompressedBytes);
    }
}

Here’s another approach using the extension methods technique to extend the String class to add string compression and decompression. You can drop the class below into an existing project and then use thusly:

var uncompressedString = "Hello World!";
var compressedString = uncompressedString.Compress();

and

var decompressedString = compressedString.Decompress();

To wit:

public static class Extensions
{
    /// <summary>
    /// Compresses a string and returns a deflate compressed, Base64 encoded string.
    /// </summary>
    /// <param name="uncompressedString">String to compress</param>
    public static string Compress(this string uncompressedString)
    {
        byte[] compressedBytes;

        using (var uncompressedStream = new MemoryStream(Encoding.UTF8.GetBytes(uncompressedString)))
        {
            using (var compressedStream = new MemoryStream())
            { 
                // setting the leaveOpen parameter to true to ensure that compressedStream will not be closed when compressorStream is disposed
                // this allows compressorStream to close and flush its buffers to compressedStream and guarantees that compressedStream.ToArray() can be called afterward
                // although MSDN documentation states that ToArray() can be called on a closed MemoryStream, I don't want to rely on that very odd behavior should it ever change
                using (var compressorStream = new DeflateStream(compressedStream, CompressionLevel.Fastest, true))
                {
                    uncompressedStream.CopyTo(compressorStream);
                }

                // call compressedStream.ToArray() after the enclosing DeflateStream has closed and flushed its buffer to compressedStream
                compressedBytes = compressedStream.ToArray();
            }
        }

        return Convert.ToBase64String(compressedBytes);
    }

    /// <summary>
    /// Decompresses a deflate compressed, Base64 encoded string and returns an uncompressed string.
    /// </summary>
    /// <param name="compressedString">String to decompress.</param>
    public static string Decompress(this string compressedString)
    {
        byte[] decompressedBytes;

        var compressedStream = new MemoryStream(Convert.FromBase64String(compressedString));

        using (var decompressorStream = new DeflateStream(compressedStream, CompressionMode.Decompress))
        {
            using (var decompressedStream = new MemoryStream())
            {
                decompressorStream.CopyTo(decompressedStream);

                decompressedBytes = decompressedStream.ToArray();
            }
        }

        return Encoding.UTF8.GetString(decompressedBytes);
    }
}
Whyser
  • 2,187
  • 2
  • 20
  • 40
Jace
  • 1,445
  • 9
  • 20
  • 2
    Jace: I think you're missing `using` statements for the MemoryStream instances. And to the F# developers out there: refrain from using the keyword `use` for the compressorStream/decompressorStream instance, because they need to be disposed manually before `ToArray()` gets called – knocte Aug 15 '18 at 17:13
  • 1
    Will it be better to use GZipStream as it adds some extra validation? [GZipStream or DeflateStream class?](//stackoverflow.com/q/2599080) – Michael Freidgeim Oct 01 '18 at 07:29
  • 2
    @Michael Freidgeim I wouldn't think so for compressing and decompressing memory streams. For files, or unreliable transports it makes sense. I will say that in my particular use-case high speed is very desirable so any overhead I can avoid is all the better. – Jace Nov 22 '18 at 03:58
  • Solid. Took my 20MB string of JSON down to 4.5MB. – James Esh Apr 11 '19 at 15:30
  • 1
    Works great, but you should dispose the memorystream after usage, or put every stream in using as suggested by @knocte – Sebastian Apr 24 '19 at 11:06
  • @Sebastian Enclosing streams dispose underlying streams, so it's not necessary to explicitly dispose each stream. – Jace Apr 24 '19 at 15:42
  • But MSDN states that you must dispose all streams: https://learn.microsoft.com/en-us/dotnet/api/system.io.stream.close?view=netframework-4.8 pls read Remarks, I think your misinformed. Maybe you have a better source than msdn? – Sebastian Apr 25 '19 at 07:17
  • 1
    @Sebastian read the docs: https://learn.microsoft.com/en-us/dotnet/api/system.io.compression.deflatestream.-ctor?view=netframework-4.8#System_IO_Compression_DeflateStream__ctor_System_IO_Stream_System_IO_Compression_CompressionLevel_System_Boolean_ "By default, DeflateStream owns the underlying stream, so closing the stream also closes the underlying stream." – Jace Apr 25 '19 at 16:14
  • @Sebastian I did find one object being leaked in the Compress methods, the uncompressedStream object never got disposed, so I have corrected the code to fix that. The leak happened because I set the leaveOpen parameter to true in the enclosing stream constructor. In the Decompress method, the compressedStream object is auto disposed when the enclosing stream gets disposed. – Jace Apr 25 '19 at 17:22
  • Okay, great then. – Sebastian May 06 '19 at 05:58
  • I tested this solution vs Ben's solution using a 52 MB file. This solution compressed it to 2.9 MB and took 361 ms to compress & decompress. Ben's solution compressed it to 1.66 MB and took 788 ms to compress & decompress. Both used about the same amount of memory. – James in Indy Dec 02 '22 at 20:06
24

I like @fubo's answer the best but I think this is much more elegant.

This method is more compatible because it doesn't manually store the length up front.

Also I've exposed extensions to support compression for string to string, byte[] to byte[], and Stream to Stream.

public static class ZipExtensions
{
    public static string CompressToBase64(this string data)
    {
        return Convert.ToBase64String(Encoding.UTF8.GetBytes(data).Compress());
    }

    public static string DecompressFromBase64(this string data)
    {
        return Encoding.UTF8.GetString(Convert.FromBase64String(data).Decompress());
    }
    
    public static byte[] Compress(this byte[] data)
    {
        using (var sourceStream = new MemoryStream(data))
        using (var destinationStream = new MemoryStream())
        {
            sourceStream.CompressTo(destinationStream);
            return destinationStream.ToArray();
        }
    }

    public static byte[] Decompress(this byte[] data)
    {
        using (var sourceStream = new MemoryStream(data))
        using (var destinationStream = new MemoryStream())
        {
            sourceStream.DecompressTo(destinationStream);
            return destinationStream.ToArray();
        }
    }
    
    public static void CompressTo(this Stream stream, Stream outputStream)
    {
        using (var gZipStream = new GZipStream(outputStream, CompressionMode.Compress))
        {
            stream.CopyTo(gZipStream);
            gZipStream.Flush();
        }
    }

    public static void DecompressTo(this Stream stream, Stream outputStream)
    {
        using (var gZipStream = new GZipStream(stream, CompressionMode.Decompress))
        {
            gZipStream.CopyTo(outputStream);
        }
    }
}
Ben Wilde
  • 5,552
  • 2
  • 39
  • 36
  • I tested this solution vs Jace's solution using a 52 MB file. Jace's solution compressed it to 2.9 MB and took 361 ms to compress & decompress. This solution compressed it to 1.66 MB but took 788 ms to compress & decompress. Both used about the same amount of memory. – James in Indy Dec 02 '22 at 20:06
13

This is an updated version for .NET 4.5 and newer using async/await and IEnumerables:

public static class CompressionExtensions
{
    public static async Task<IEnumerable<byte>> Zip(this object obj)
    {
        byte[] bytes = obj.Serialize();

        using (MemoryStream msi = new MemoryStream(bytes))
        using (MemoryStream mso = new MemoryStream())
        {
            using (var gs = new GZipStream(mso, CompressionMode.Compress))
                await msi.CopyToAsync(gs);

            return mso.ToArray().AsEnumerable();
        }
    }

    public static async Task<object> Unzip(this byte[] bytes)
    {
        using (MemoryStream msi = new MemoryStream(bytes))
        using (MemoryStream mso = new MemoryStream())
        {
            using (var gs = new GZipStream(msi, CompressionMode.Decompress))
            {
                // Sync example:
                //gs.CopyTo(mso);

                // Async way (take care of using async keyword on the method definition)
                await gs.CopyToAsync(mso);
            }

            return mso.ToArray().Deserialize();
        }
    }
}

public static class SerializerExtensions
{
    public static byte[] Serialize<T>(this T objectToWrite)
    {
        using (MemoryStream stream = new MemoryStream())
        {
            BinaryFormatter binaryFormatter = new BinaryFormatter();
            binaryFormatter.Serialize(stream, objectToWrite);

            return stream.GetBuffer();
        }
    }

    public static async Task<T> _Deserialize<T>(this byte[] arr)
    {
        using (MemoryStream stream = new MemoryStream())
        {
            BinaryFormatter binaryFormatter = new BinaryFormatter();
            await stream.WriteAsync(arr, 0, arr.Length);
            stream.Position = 0;

            return (T)binaryFormatter.Deserialize(stream);
        }
    }

    public static async Task<object> Deserialize(this byte[] arr)
    {
        object obj = await arr._Deserialize<object>();
        return obj;
    }
}

With this you can serialize everything BinaryFormatter supports, instead only of strings.

Edit:

In case, you need take care of Encoding, you could just use Convert.ToBase64String(byte[])...

Take a look at this answer if you need an example!

z3nth10n
  • 2,341
  • 2
  • 25
  • 49
  • You must reset the Stream position before DeSerializing, edited your sample. Also, your XML comments are unrelated. – Magnus Johansson Apr 18 '18 at 14:37
  • Worth noting this works but only for UTF8-based things. If you add in, say, Swedish characters like åäö to the string value you're serialize/deserializing it will fail a round-trip test :/ – bc3tech Aug 21 '19 at 12:32
  • In this case you could use `Convert.ToBase64String(byte[])`. Please, see this answer (https://stackoverflow.com/a/23908465/3286975). Hope it helps! – z3nth10n Aug 22 '19 at 02:31
7

For those who still getting The magic number in GZip header is not correct. Make sure you are passing in a GZip stream. ERROR and if your string was zipped using php you'll need to do something like:

       public static string decodeDecompress(string originalReceivedSrc) {
        byte[] bytes = Convert.FromBase64String(originalReceivedSrc);

        using (var mem = new MemoryStream()) {
            //the trick is here
            mem.Write(new byte[] { 0x1f, 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00 }, 0, 8);
            mem.Write(bytes, 0, bytes.Length);

            mem.Position = 0;

            using (var gzip = new GZipStream(mem, CompressionMode.Decompress))
            using (var reader = new StreamReader(gzip)) {
                return reader.ReadToEnd();
                }
            }
        }
Choletski
  • 7,074
  • 6
  • 43
  • 64
  • I get this exception: Exception thrown: 'System.IO.InvalidDataException' in System.dll Additional information: The CRC in GZip footer does not match the CRC calculated from the decompressed data. – Dainius Kreivys Sep 02 '16 at 09:02
  • I am sure someone will face the same issue. To have that magic header in the compressed string, you need to use proper php function: "gzencode" instead of "gzcompress". There is also another compression algorithm in PHP: "gzinflate" but I personally never used it. P.S. Your code has an issue: you wrote a header and then overwritten it with actual bytes by giving offset 0 to second Write() method, so as a result you have the same bytes in a stream. – Developer Sep 10 '21 at 05:28
5

For .net6 cross platform Compression/Decompression string with C# using SharpZipLib library. Test for ubuntu(18.0.x) and windows.

#region helper

private byte[] Zip(string text)
{
    if (text == null)
        return null;

    byte[] ret;
    using (var outputMemory = new MemoryStream())
    {
        using (var gz = new GZipStream(outputMemory, CompressionLevel.Optimal))
        {
            using (var sw = new StreamWriter(gz, Encoding.UTF8))
            {
                sw.Write(text);
            }
        }
        ret = outputMemory.ToArray();
    }
    return ret;
}

private string Unzip(byte[] bytes)
{
    string ret = null;
    using (var inputMemory = new MemoryStream(bytes))
    {
        using (var gz = new GZipStream(inputMemory, CompressionMode.Decompress))
        {
            using (var sr = new StreamReader(gz, Encoding.UTF8))
            {
                ret = sr.ReadToEnd();
            }
        }
    }
    return ret;
}
#endregion
sina_Islam
  • 1,068
  • 12
  • 19
4

We can reduce code complexity by using StreamReader and StreamWriter rather than manually converting strings to byte arrays. Three streams is all you need:

    public static byte[] Zip(string uncompressed)
    {
        byte[] ret;
        using (var outputMemory = new MemoryStream())
        {
            using (var gz = new GZipStream(outputMemory, CompressionLevel.Optimal))
            {
                using (var sw = new StreamWriter(gz, Encoding.UTF8))
                {
                    sw.Write(uncompressed);
                }
            }
            ret = outputMemory.ToArray();
        }
        return ret;
    }

    public static string Unzip(byte[] compressed)
    {
        string ret = null;
        using (var inputMemory = new MemoryStream(compressed))
        {
            using (var gz = new GZipStream(inputMemory, CompressionMode.Decompress))
            {
                using (var sr = new StreamReader(gz, Encoding.UTF8))
                {
                    ret = sr.ReadToEnd();
                }
            }
        }
        return ret;
    }
jgfооt
  • 914
  • 6
  • 12
  • I tried that but it was causing the issues in some cases. I even tried Convert.UTF8 but it also had issues in some cases. The only 100% working solution was to simply for loop and manually build the string as well as convert string to bytes. – Developer Sep 10 '21 at 05:04