0

I've got a windows program using C# that is working with log files. Some of these different log files come in gzipped (for instance test.log.gz). I've got code using SharpZipLib to unzip those log files and it works really well.

public static void unZip(string gzipFilePath, string targetDir)
{
    byte[] dataBuffer = new byte[4096];

    using (System.IO.Stream fs = new FileStream(gzipFilePath, FileMode.Open, FileAccess.Read))
    {
        using (GZipInputStream gzipStream = new GZipInputStream(fs))
        {
            string fnOut = Path.Combine(targetDir, Path.GetFileNameWithoutExtension(gzipFilePath));

            using (FileStream fsOut = File.Create(fnOut))
            {
                StreamUtils.Copy(gzipStream, fsOut, dataBuffer);
            }
        }
    }
}

From my research, it would seem that gzip files are typically one file, so it's always for instance, test.htm.gz. So I would create a file named test.htm and put the uncompressed information into test.htm, which happens in this part of the code:

using (GZipInputStream gzipStream = new GZipInputStream(fs))
{
    string fnOut = Path.Combine(targetDir, Path.GetFileNameWithoutExtension(gzipFilePath));

    using (FileStream fsOut = File.Create(fnOut))
    {
        StreamUtils.Copy(gzipStream, fsOut, dataBuffer);
    }
}

This is all well and good but the problem I'm having is I've been given a log file, for example again, test.log.gz that has directories zipped into it.

When I use the 7-Zip gui to unzip the file, the log file I need is five directories deep in folders. So after unzipping with 7-zip, it outputs:

folder1 -> folder2 -> folder3 -> folder4 -> folder5 -> test.log

You can see how it's labeled test.log.gz.  When I use the 7-zip gui to uncompress, instead of creating test.log, it creates a folder structure.

When I navigate through the folder the 7-Zip gui uncompressed, you find the test.log file buried five folders deep.  From what I understand, that's not how gzip is supposed to work.

Trying to use the method provided from SharpLib only gives me a small subset of the data of the file in test.log.

I haven't been able to find any code or issues dealing with gzipped files containing folders and from what I can tell, you're not supposed to do that. It should be in a .tar and then gzipped.

Any one have any idea of what I could do with this .gz file?

Apex Coder
  • 65
  • 2
  • 10

2 Answers2

0

First Maybe try using another lib here are a few

http://dotnetzip.codeplex.com/

http://www.icsharpcode.net/OpenSource/SharpZipLib/

There is also a built in GZ lib built into .net see

Unzipping a .gz file using C#

Community
  • 1
  • 1
Micah Armantrout
  • 6,781
  • 4
  • 40
  • 66
  • In my question I state I'm already using SharpZipLib to successfully unpack normal .gz files. I could try the one from codeplex but what I really need is a suggestion on dealing with .gz files that somehow contain a directory structure instead of a single file. – Apex Coder Mar 14 '12 at 16:25
  • maybe this will help ? http://www.codeproject.com/Articles/23010/Compress-Folders-with-C-and-the-SharpZipLib – Micah Armantrout Mar 14 '12 at 16:28
  • Sorry, it's just showing me how to compress files with SharpZipLib. I think I'll create a couple images of the file structure and place them in my question to help shed light on what I'm dealing with. – Apex Coder Mar 14 '12 at 16:35
  • Added some images for clarity. – Apex Coder Mar 14 '12 at 16:46
0

There is still just one file in there, so there isn't any violation of the gzip format. gzip permits an entire path name to be stored with the file, so that path may simply be ghostcache/ic_split_files/CBN/00-christmas/test.log and 7-Zip is faithfully recreating that path. You should be able to see this in the gzip header, starting about ten bytes in.

The fact that you are getting back only a subset of the log may or may not be related to the pathname in the gzip file.

Please provide a hex dump of the first 64 bytes of the .gz file that worked and the the .gz file that didn't.

Mark Adler
  • 101,978
  • 13
  • 118
  • 158