3

I wrote some code like this:

std::vector<char> unzip(std::vector<char> const& compressed)
{
   std::vector<char> decompressed;

   boost::iostreams::filtering_ostream os;

   os.push(boost::iostreams::gzip_decompressor());
   os.push(boost::iostreams::back_inserter(decompressed));

   boost::iostreams::write(os, &compressed[0], compressed.size());
   os.reset();
   return decompressed;
}

If the compressed is a zip bomb, what will happen? I think the memory will exhaust and the process will crash.

So how to avoid this? How can I check the raw data size before decompress?

alpha
  • 1,228
  • 1
  • 11
  • 26
  • I would run this in a `std::thread`, check the size of the shared decompressed and terminate the thread if a Limit is exceeded. See https://stackoverflow.com/questions/12207684/how-do-i-terminate-a-thread-in-c11 – schorsch_76 May 30 '18 at 05:46
  • Write a custom back_inserter where you can add a terminate flag that you can use to throw an exception to go out to an added try/catch handler. – schorsch_76 May 30 '18 at 05:54
  • @schorsch_76 I think you are right. – alpha May 30 '18 at 07:49
  • 1
    No. The thread idea is horrible. It creates a lot of complexity and inefficiency for no reason. (If you were going to do _that_, you could just launch a separate process with suitable resource limits. This at once has the benefit or process space isolation, so it defends against a plethora of other vulnerabilities that might exist in the decompression codec) – sehe May 30 '18 at 09:38
  • 3
    @schorsch_76 you cannot really terminate a thread. – n. m. could be an AI May 30 '18 at 09:39
  • @sehe I mean the custom back_inserter, not the thread. Of course, the thread idea is terrible. – alpha May 31 '18 at 03:07

2 Answers2

2

You would do it, just like always: pay attention while unzipping.

You can either use a buffer with a fixed/limited capacity (like use boost::iostreams::array_sink) or you can wrap your copy operation with a guard for the maximum size.

Also, in your example the input is an in-memory buffer, so it makes more sense to use a device than a stream for input. So here's a simple take:

std::vector<char> unzip(size_t limit, std::vector<char> const& compressed) {
   std::vector<char> decompressed;

   boost::iostreams::filtering_istream is;

   is.push(boost::iostreams::gzip_decompressor());
   is.push(boost::iostreams::array_source(compressed.data(), compressed.size()));

   while (is && (decompressed.size() < limit)) {
       char buf[512];
       is.read(buf, sizeof(buf));
       decompressed.insert(decompressed.end(), buf, buf + is.gcount());
   }
   return decompressed;
}

When creating a simple mini-bomb of 60 bytes that will expand into 20 kilobytes of NUL chars:

int main() {
    std::vector<char> const bomb = { 
          char(0x1f), char(0x8b), char(0x08), char(0x08), char(0xd1), char(0x6d), char(0x0e), char(0x5b), char(0x00), char(0x03), char(0x62), char(0x6f),
          char(0x6d), char(0x62), char(0x00), char(0xed), char(0xc1), char(0x31), char(0x01), char(0x00), char(0x00), char(0x00), char(0xc2), char(0xa0),
          char(0xf5), char(0x4f), char(0x6d), char(0x0a), char(0x3f), char(0xa0), char(0x00), char(0x00), char(0x00), char(0x00), char(0x00), char(0x00),
          char(0x00), char(0x00), char(0x00), char(0x00), char(0x00), char(0x00), char(0x00), char(0x00), char(0x00), char(0x00), char(0x00), char(0x00),
          char(0x00), char(0x80), char(0xb7), char(0x01), char(0x60), char(0x83), char(0xbc), char(0xe6), char(0x00), char(0x50), char(0x00), char(0x00)
        };

    auto max10k  = unzip(10*1024, bomb);
    auto max100k = unzip(100*1024, bomb);

    std::cout << "max10k:  " << max10k.size()  << " bytes\n";
    std::cout << "max100k: " << max100k.size() << " bytes\n";
}

Prints Live On Coliru

max10k:  10240 bytes
max100k: 20480 bytes

Throwing

Of course you can opt to throw if the limit is exceeded:

std::vector<char> unzip(size_t limit, std::vector<char> const& compressed) {
   std::vector<char> decompressed;

   boost::iostreams::filtering_istream is;

   is.push(boost::iostreams::gzip_decompressor());
   is.push(boost::iostreams::array_source(compressed.data(), compressed.size()));

   while (is) {
       char buf[512];
       is.read(buf, sizeof(buf)); // can't detect EOF before attempting read on some streams

       if (decompressed.size() + is.gcount() >= limit)
           throw std::runtime_error("unzip limit exceeded");

       decompressed.insert(decompressed.end(), buf, buf + is.gcount());
   }
   return decompressed;
}
sehe
  • 374,641
  • 47
  • 450
  • 633
1

schorsch_76 just said I can write a custom back_inserter, so I just wrote one and it works:

namespace boost {
namespace iostreams {
template<typename Container>
class limit_back_insert_device {
public:
    typedef typename Container::value_type  char_type;
    typedef sink_tag                        category;
    limit_back_insert_device(Container& cnt, size_t max_size)
        : container(&cnt)
        , max_size(max_size) {
        check(0);
    }
    std::streamsize write(const char_type* s, std::streamsize n) {
        check(n);
        container->insert(container->end(), s, s + n);
        return n;
    }
private:
    void check(size_t n) {
        if (std::numeric_limits<size_t>::max() - n < container->size()) {
            throw std::runtime_error("size_t overflow");
        }

        if ((container->size() + n) > max_size) {
            throw std::runtime_error("container->size() > max_size");
        }
    }
protected:
    Container * container;
    size_t const max_size;
};

template<typename Container>
limit_back_insert_device<Container> limit_back_inserter(Container& cnt,
    size_t max_size) {
    return limit_back_insert_device<Container>(cnt, max_size);
}
} 
}
alpha
  • 1,228
  • 1
  • 11
  • 26