15

I'm searching a way to add embedded resource to my solution. This resources will be folders with a lot of files in them. On user demand they need to be decompressed.

I'm searching for a way do store such folders in executable without involving third-party libraries (Looks rather stupid, but this is the task).

I have found, that I can GZip and UnGZip them using standard libraries. But GZip handles single file only. In such cases TAR should come to the scene. But I haven't found TAR implementation among standard classes.

Maybe it possible decompress TAR with bare C#?

shytikov
  • 9,155
  • 8
  • 56
  • 103

6 Answers6

14

While looking for a quick answer to the same question, I came across this thread, and was not entirely satisfied with the current answers, as they all point to using third-party dependencies to much larger libraries, all just to achieve simple extraction of a tar.gz file to disk.

While the gz format could be considered rather complicated, tar on the other hand is quite simple. At its core, it just takes a bunch of files, prepends a 500 byte header (but takes 512 bytes) to each describing the file, and writes them all to single archive on a 512 byte alignment. There is no compression, that is typically handled by compressing the created file to a gz archive, which .NET conveniently has built-in, which takes care of all the hard part.

Having looked at the spec for the tar format, there are only really 2 values (especially on Windows) we need to pick out from the header in order to extract the file from a stream. The first is the name, and the second is size. Using those two values, we need only seek to the appropriate position in the stream and copy the bytes to a file.

I made a very rudimentary, down-and-dirty method to extract a tar archive to a directory, and added some helper functions for opening from a stream or filename, and decompressing the gz file first using built-in functions.

The primary method is this:

public static void ExtractTar(Stream stream, string outputDir)
{
    var buffer = new byte[100];
    while (true)
    {
        stream.Read(buffer, 0, 100);
        var name = Encoding.ASCII.GetString(buffer).Trim('\0');
        if (String.IsNullOrWhiteSpace(name))
            break;
        stream.Seek(24, SeekOrigin.Current);
        stream.Read(buffer, 0, 12);
        var size = Convert.ToInt64(Encoding.ASCII.GetString(buffer, 0, 12).Trim(), 8);

        stream.Seek(376L, SeekOrigin.Current);

        var output = Path.Combine(outputDir, name);
        if (!Directory.Exists(Path.GetDirectoryName(output)))
            Directory.CreateDirectory(Path.GetDirectoryName(output));
        using (var str = File.Open(output, FileMode.OpenOrCreate, FileAccess.Write))
        {
            var buf = new byte[size];
            stream.Read(buf, 0, buf.Length);
            str.Write(buf, 0, buf.Length);
        }

        var pos = stream.Position;

        var offset = 512 - (pos  % 512);
        if (offset == 512)
            offset = 0;

        stream.Seek(offset, SeekOrigin.Current);
    }
}

And here is a few helper functions for opening from a file, and automating first decompressing a tar.gz file/stream before extracting.

public static void ExtractTarGz(string filename, string outputDir)
{
    using (var stream = File.OpenRead(filename))
        ExtractTarGz(stream, outputDir);
}

public static void ExtractTarGz(Stream stream, string outputDir)
{
    // A GZipStream is not seekable, so copy it first to a MemoryStream
    using (var gzip = new GZipStream(stream, CompressionMode.Decompress))
    {
        const int chunk = 4096;
        using (var memStr = new MemoryStream())
        {
            int read;
            var buffer = new byte[chunk];
            do
            {
                read = gzip.Read(buffer, 0, chunk);
                memStr.Write(buffer, 0, read);
            } while (read == chunk);

            memStr.Seek(0, SeekOrigin.Begin);
            ExtractTar(memStr, outputDir);
        }
    }
}

public static void ExtractTar(string filename, string outputDir)
{
    using (var stream = File.OpenRead(filename))
        ExtractTar(stream, outputDir);
}

Here is a gist of the full file with some comments.

ForeverZer0
  • 2,379
  • 1
  • 24
  • 32
  • 2
    FYI. Your `Path.Join` call is only valid in .NET Core 2.1. To make it more universal use `Path.Combine`. – Doug S Aug 24 '18 at 01:06
  • @DougS Good call, didn't even notice I did that. Have been messing around with Ruby lately, had "join" on the mind, lol. Corrected. – ForeverZer0 Aug 24 '18 at 01:09
  • And unfortunately this code error'ed out on the [sample file](http://geolite.maxmind.com/download/geoip/database/GeoLite2-City.tar.gz) I tested it with. The error occurred at `var size = Convert.ToInt64(Encoding.ASCII.GetString(buffer, 0, 12).Trim(), 8);` with `System.FormatException: Additional non-parsable characters are at the end of the string. at System.ParseNumbers.StringToLong(String s, Int32 radix, Int32 flags, Int32* currPos)`. – Doug S Aug 24 '18 at 02:07
  • I will do some debugging as soon as I can. I did admittedly throw this together rather quickly and only tested with a few different samples. Seems the stream is positioned incorrectly, probably wrong offset computed from the previous iteration. I was getting some inconsistent behavior with files that naturally fell on a 512 alignment without seeking, might be related. Will update soon as I get figured it out. – ForeverZer0 Aug 24 '18 at 02:15
  • 1
    I also got that same error (on the same file too) @DougS. It appears it has trailing white space when trying to determine the `size` for a directory. After that, it also complained about creating 0 sized files for directory entries. Have forked @ForeverZer0's gist with rudimentary [fixes](https://gist.github.com/davetransom/553aeb3c4388c3eb448c0afe564cd2e3) for my use-case. – Dave Transom Jan 31 '19 at 22:45
  • @ForeverZer0: Any update on the code error mentioned by Doug S? – CuriousCase Oct 29 '19 at 10:21
10

Tar-cs will do the job, but it is quite slow. I would recommend using SharpCompress which is significantly quicker. It also supports other compression types and it has been updated recently.

using System;
using System.IO;
using SharpCompress.Common;
using SharpCompress.Reader;

private static String directoryPath = @"C:\Temp";

public static void unTAR(String tarFilePath)
{
    using (Stream stream = File.OpenRead(tarFilePath))
    {
        var reader = ReaderFactory.Open(stream);
        while (reader.MoveToNextEntry())
        {
            if (!reader.Entry.IsDirectory)
            {
                ExtractionOptions opt = new ExtractionOptions {
                    ExtractFullPath = true,
                    Overwrite = true
                };
                reader.WriteEntryToDirectory(directoryPath, opt);
            }
        }
    }
}
Steven
  • 1,564
  • 1
  • 22
  • 34
  • 2
    thanks for the answer! By way of an update in 2020, the ExtractOptions are now done via instantiation. For example `new ExtractionOptions() { ExtractFullPath = true, Overwrite = true}` in the constructor of `WriteEntryToDirectory`. See [this link](https://github.com/adamhathcock/sharpcompress/blob/master/USAGE.md#extract-all-files-from-a-rar-file-to-a-directory-using-rararchive) – joshmcode Mar 18 '20 at 13:04
4

See tar-cs

using (FileStream unarchFile = File.OpenRead(tarfile))
{
    TarReader reader = new TarReader(unarchFile);
    reader.ReadToEnd("out_dir");
}
IanNorton
  • 7,145
  • 2
  • 25
  • 28
2

.NET 7 added several classes to work with TAR files:

Extract to a directory:

await TarFile.ExtractToDirectoryAsync(tarFilePath, outputDir);

Enumerate a TAR file and manually extract its entries:

await using var tarStream = new FileStream(tarFilePath, new FileStreamOptions { Mode = FileMode.Open, Access = FileAccess.Read, Options = FileOptions.Asynchronous });
await using var tarReader = new TarReader(tarStream);
TarEntry entry;
while ((entry = await tarReader.GetNextEntryAsync()) != null)
{
  if (entry.EntryType is TarEntryType.SymbolicLink or TarEntryType.HardLink or TarEntryType.GlobalExtendedAttributes)
  {
     continue;
  }

  Console.WriteLine($"Extracting {entry.Name}");
  await entry.ExtractToFileAsync(Path.Join(outputDirectory, entry.Name));
}
ckuri
  • 3,784
  • 2
  • 15
  • 17
2

Since you are not allowed to use outside libraries, you are not restricted to a specific format of the tar file either. In fact, they don't even need it to be all in the same file.

You can write your own tar-like utility in C# that walks a directory tree, and produces two files: a "header" file that consists of a serialized dictionary mapping System.IO.Path instances to an offset/length pairs, and a big file containing the content of individual files concatenated into one giant blob. This is not a trivial task, but it's not overly complicated either.

Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
0

there are 2 ways to compress/decompress in .NET first you can use Gzipstream class and DeflatStream both can actually do compress your files in .gz format so if you compressed any file in Gzipstream it can be opened with any popular compression applications such as winzip/ winrar, 7zip but you can't open compressed file with DeflatStream. these two classes are from .NET 2.

and there is another way which is Package class it's actually same as Gzipstream and DeflatStream the only different is you can compress multiple files which then can be opened with winzip/ winrar, 7zip.so that's all .NET has. but it's not even generic .zip file, it something Microsoft uses to compress their *x extension office files. if you decompress any docx file with package class you can see everything stored in it. so don't use .NET libraries for compressing or even decompressing cause you can't even make a generic compress file or even decompress a generic zip file. you have to consider for a third party library such as http://www.icsharpcode.net/OpenSource/SharpZipLib/

or implement everything from the ground floor.

user1120193
  • 252
  • 3
  • 11