3

I'm using boost gzip example code here. I am attempting to compress a simple string test and am expecting the compressed string H4sIAAAAAAAACitJLS4BAAx+f9gEAAAA as shown in this online compressor

static std::string compress(const std::string& data)
{
    namespace bio = boost::iostreams;
    std::stringstream compressed;
    std::stringstream origin(data);

    bio::filtering_streambuf<bio::input> out;
    out.push(bio::gzip_compressor(bio::gzip_params(bio::gzip::best_compression)));
    out.push(origin);
    
    bio::copy(out, compressed);
    return compressed.str();
}

int main(int argc, char* argv[]){
    std::cout << compress("text") << std::endl;
    // prints out garabage

    return 0;
}

However when I print out the result of the conversion I get garbage values like +I-. ~

I know that it's a valid conversion because the decompression value returns the correct string. However I need the format of the string to be human readable i.e. H4sIAAAAAAAACitJLS4BAAx+f9gEAAAA.

How can I modify the code to output human readable text?

Thanks

Motivation

The garbage format is not compatible with my JSON library where I will send the compressed text through.

Tom
  • 1,235
  • 9
  • 22
  • Looks like that website shows you the compressed data encoded to Base64. So encode it as well, e.g. https://stackoverflow.com/questions/7053538/how-do-i-encode-a-string-to-base64-using-only-boost – Dan Mašek Jan 07 '22 at 18:23
  • 1
    @DanMašek hehe - timing – sehe Jan 07 '22 at 18:24
  • 1
    It's not "garbage". It's binary data. The whole point of compression is to compress, so the output uses all 256 possible byte values to permit the output to be as small as possible. You can encode the data into a smaller number of byte values to make it readable, e.g. 64 values in Base64, which is what you are looking at with `H4s`... That _expands_ the data, cancelling some of the compression. See the answers [here](https://stackoverflow.com/questions/1443158/binary-data-in-json-string-something-better-than-base64) for alternatives to your JSON embedding problem. – Mark Adler Jan 07 '22 at 18:39

1 Answers1

7

The example site completely fails to mention they also base64 encode the result:

base64 -d <<< 'H4sIAAAAAAAACitJLS4BAAx+f9gEAAAA' | gunzip -

Prints:

test

In short, you need to also do that:

Live On Coliru

#include <boost/iostreams/copy.hpp>
#include <boost/iostreams/filter/gzip.hpp>
#include <boost/iostreams/filtering_streambuf.hpp>
#include <iostream>
#include <sstream>

#include <boost/archive/iterators/binary_from_base64.hpp>
#include <boost/archive/iterators/base64_from_binary.hpp>
#include <boost/archive/iterators/transform_width.hpp>

std::string decode64(std::string const& val)
{
    using namespace boost::archive::iterators;
    return {
        transform_width<binary_from_base64<std::string::const_iterator>, 8, 6>{
            std::begin(val)},
        {std::end(val)},
    };
}

std::string encode64(std::string const& val)
{
    using namespace boost::archive::iterators;
    std::string r{
        base64_from_binary<transform_width<std::string::const_iterator, 6, 8>>{
            std::begin(val)},
        {std::end(val)},
    };
    return r.append((3 - val.size() % 3) % 3, '=');
}

static std::string compress(const std::string& data)
{
    namespace bio = boost::iostreams;
    std::istringstream origin(data);

    bio::filtering_istreambuf in;
    in.push(
        bio::gzip_compressor(bio::gzip_params(bio::gzip::best_compression)));
    in.push(origin);

    std::ostringstream compressed;
    bio::copy(in, compressed);
    return compressed.str();
}

static std::string decompress(const std::string& data)
{
    namespace bio = boost::iostreams;
    std::istringstream compressed(data);

    bio::filtering_istreambuf in;
    in.push(bio::gzip_decompressor());
    in.push(compressed);

    std::ostringstream origin;
    bio::copy(in, origin);
    return origin.str();
}

int main() { 
    auto msg = encode64(compress("test"));
    std::cout << msg << std::endl;
    std::cout << decompress(decode64(msg)) << std::endl;
}

Prints

H4sIAAAAAAAC/ytJLS4BAAx+f9gEAAAA
test
sehe
  • 374,641
  • 47
  • 450
  • 633
  • in the decod64 and encode64 what are the literals 6,8 – Mohammad Kanan Jul 22 '22 at 01:57
  • 1
    @MohammadKanan According to the [docs](https://www.boost.org/doc/libs/1_64_0/libs/serialization/doc/dataflow.html) they [_"retrieve 6 bit integers from a sequence of 8 bit bytes"_](https://www.boost.org/doc/libs/1_64_0/libs/serialization/doc/dataflow.html#:~:text=retrieve%206%20bit%20integers%20from%20a%20sequence%20of%208%20bit%20bytes) – sehe Jul 22 '22 at 02:41
  • I asked because if I increase the input characters , I get SIGABRT on the decompress() `boost::iostreams::copy` – Mohammad Kanan Jul 22 '22 at 10:55
  • 1
    @MohammadKanan Wow those questions don't even look similar :) In seriousness, base64 has stringent length requirements (did you remember to pad to multiples of 3 bytes?) and the "data flow iterators" are very un-user-friendly (see [e.g.](https://stackoverflow.com/questions/71932617/c-boost-base64-decoder-fails-when-newlines-are-present/71933410#71933410)). I would vastly prefer [using Boost Beast's implementation](https://stackoverflow.com/a/48176443/85371) – sehe Jul 22 '22 at 12:23
  • no I didnt go far with the details of `decode/enode64` , I am ok with your ocde example as is, because I have another goal .. saving the compressed string to file -- which I never get to work when try to gunzip the file.. sorry for jumping to different things withut being clear .. I thought I should focus on things related to SO question – Mohammad Kanan Jul 22 '22 at 12:51
  • 1
    @MohammadKanan The best thing you can do is ask your own question, it seems. That way it will be on-topic and could help others – sehe Jul 22 '22 at 13:25