2

I'm trying to decompress programmatically a gzip file into memory and mimic the command gzip -d file.gz using libarchive project. The file is actually taken from http response with the following header Accept-Encoding: gzip, deflate

Here my attempt to read the file. I don't expect not to work since the gzipped file has no entries (it's compressed as a stream) and archive_read_next_header tries to read the next file out of the arcihve.

Are there any alternative to this function that extract the entire data out of the compressed file.

archive_read_support_format_raw(archive); 
archive_read_support_filter_all(archive);
archive_read_support_compression_all(archive)

archive_read_open_memory(archive, file_data, file_size);
struct archive_entry *entry;
la_ssize_t total, size;
char *buf;    
int status = archive_read_next_header(archive, &entry);

Perhaps someone can post minimal code example that resolve this issue ? Also, Is there an option to find out whether a gzip archive file has entries or not ?

Irad K
  • 867
  • 6
  • 20
  • 1
    gzip is not an _archive_ format. You should look for a pure gzip or compression library, `libarchive` expects (as its name implies) a archive. – tkausl Jan 21 '19 at 08:29
  • @tkausl, thanks for clearing this out. Perhaps you elaborate about the difference between gzip and another compression algorithm that support archiving? furthermore, can you recommend of minimal, ad-hoc and bug-less library that support gzip decompressing (I've examined boost, but i wish to avoid huge deployments of boost's entire logic) – Irad K Jan 21 '19 at 08:42
  • To decompress a file, ala `gzip -d file.gz` , use [libz](https://www.zlib.net/) to do what you want. – selbie Jan 21 '19 at 08:42
  • 1
    `Perhaps you elaborate about the difference between gzip and another compression algorithm that support archiving?` which for example? If you think of `tar.gz`, its quite literally a tar archive, which doesn't know anything about compression, compressed with gzip, which doesn't know anything about archives. – tkausl Jan 21 '19 at 08:53
  • @tkuasl, so if I understand you right, the difference between `tar.gz` and `gzip` is that the first is archived and the second is not. – Irad K Jan 21 '19 at 08:58
  • 1
    Yes, .tar.gz is an gzipped .tar archive, however you can gzip anything, so a .gz file isn't necessarily also an archive. – tkausl Jan 21 '19 at 09:00

2 Answers2

0

One possible alternative is to use boost::iostreams library which comes with an inbuilt gzip filter and allows exactly what you want - streaming decompression from a gzip file in memory. Here is the reference to the gzip filter, and a snippet from the same:

ifstream file("hello.gz", ios_base::in | ios_base::binary);
filtering_streambuf<input> in;
in.push(gzip_decompressor());
in.push(file);
boost::iostreams::copy(in, cout);

Edit: Actually a much better snippet is available here https://stackoverflow.com/a/16693807/3656081

tangy
  • 3,056
  • 2
  • 25
  • 42
  • Hey and thank for you for you answer, I just wonder if you could point me also to an implementation based on zlib. For licencing and deployment reasons , i prefer to avoid using boost. – Irad K Jan 22 '19 at 09:22
-1

There are two ways to do this using zlib:

  1. Using the inbuilt GzFile API: Coliru Link - Read more on this here
int inf(FILE* fp) {
    auto gzf = ::gzdopen(fileno(fp), "r");
    assert(::gztell(gzf) == 0);
    std::cout << "pos: " << ::gztell(gzf) << std::endl;
    ::gzseek(gzf, 18L, SEEK_SET);
    char buf[768] = {0};
    ::gzread(gzf, buf, sizeof(buf)); // use a custom size as needed
    std::cout << buf << std::endl; // Print file contents from 18th char onward
    ::gzclose(gzf);
    return 0;
}
  1. The native inflate API: Coliru Link. More on this in the manual link above and here. My code is almost completely a duplicate of the provided link and pretty long, so I wont repost.
tangy
  • 3,056
  • 2
  • 25
  • 42