75

I'm currently learning C++ (Coming from Java) and I'm trying to understand how to use IO streams properly in C++.

Let's say I have an Image class which contains the pixels of an image and I overloaded the extraction operator to read the image from a stream:

istream& operator>>(istream& stream, Image& image)
{
    // Read the image data from the stream into the image
    return stream;
}

So now I'm able to read an image like this:

Image image;
ifstream file("somepic.img");
file >> image;

But now I want to use the same extraction operator to read the image data from a custom stream. Let's say I have a file which contains the image in compressed form. So instead of using ifstream I might want to implement my own input stream. At least that's how I would do it in Java. In Java I would write a custom class extending the InputStream class and implementing the int read() method. So that's pretty easy. And usage would look like this:

InputStream stream = new CompressedInputStream(new FileInputStream("somepic.imgz"));
image.read(stream);

So using the same pattern maybe I want to do this in C++:

Image image;
ifstream file("somepic.imgz");
compressed_stream stream(file);
stream >> image;

But maybe that's the wrong way, don't know. Extending the istream class looks pretty complicated and after some searching I found some hints about extending streambuf instead. But this example looks terribly complicated for such a simple task.

So what's the best way to implement custom input/output streams (or streambufs?) in C++?

Solution

Some people suggested not using iostreams at all and to use iterators, boost or a custom IO interface instead. These may be valid alternatives but my question was about iostreams. The accepted answer resulted in the example code below. For easier reading there is no header/code separation and the whole std namespace is imported (I know that this is a bad thing in real code).

This example is about reading and writing vertical-xor-encoded images. The format is pretty easy. Each byte represents two pixels (4 bits per pixel). Each line is xor'd with the previous line. This kind of encoding prepares the image for compression (usually results in lot of 0-bytes which are easier to compress).

#include <cstring>
#include <fstream>

using namespace std;

/*** vxor_streambuf class ******************************************/

class vxor_streambuf: public streambuf
{
public:
    vxor_streambuf(streambuf *buffer, const int width) :
        buffer(buffer),
        size(width / 2)
    {
        previous_line = new char[size];
        memset(previous_line, 0, size);
        current_line = new char[size];
        setg(0, 0, 0);
        setp(current_line, current_line + size);
    }

    virtual ~vxor_streambuf()
    {
        sync();
        delete[] previous_line;
        delete[] current_line;
    }

    virtual streambuf::int_type underflow()
    {
        // Read line from original buffer
        streamsize read = buffer->sgetn(current_line, size);
        if (!read) return traits_type::eof();

        // Do vertical XOR decoding
        for (int i = 0; i < size; i += 1)
        {
            current_line[i] ^= previous_line[i];
            previous_line[i] = current_line[i];
        }

        setg(current_line, current_line, current_line + read);
        return traits_type::to_int_type(*gptr());
    }

    virtual streambuf::int_type overflow(streambuf::int_type value)
    {
        int write = pptr() - pbase();
        if (write)
        {
            // Do vertical XOR encoding
            for (int i = 0; i < size; i += 1)
            {
                char tmp = current_line[i];
                current_line[i] ^= previous_line[i];
                previous_line[i] = tmp;
            }

            // Write line to original buffer
            streamsize written = buffer->sputn(current_line, write);
            if (written != write) return traits_type::eof();
        }

        setp(current_line, current_line + size);
        if (!traits_type::eq_int_type(value, traits_type::eof())) sputc(value);
        return traits_type::not_eof(value);
    };

    virtual int sync()
    {
        streambuf::int_type result = this->overflow(traits_type::eof());
        buffer->pubsync();
        return traits_type::eq_int_type(result, traits_type::eof()) ? -1 : 0;
    }

private:
    streambuf *buffer;
    int size;
    char *previous_line;
    char *current_line;
};


/*** vxor_istream class ********************************************/

class vxor_istream: public istream
{
public:
    vxor_istream(istream &stream, const int width) :
        istream(new vxor_streambuf(stream.rdbuf(), width)) {}

    virtual ~vxor_istream()
    {
        delete rdbuf();
    }
};


/*** vxor_ostream class ********************************************/

class vxor_ostream: public ostream
{
public:
    vxor_ostream(ostream &stream, const int width) :
        ostream(new vxor_streambuf(stream.rdbuf(), width)) {}

    virtual ~vxor_ostream()
    {
        delete rdbuf();
    }
};


/*** Test main method **********************************************/

int main()
{
    // Read data
    ifstream infile("test.img");
    vxor_istream in(infile, 288);
    char data[144 * 128];
    in.read(data, 144 * 128);
    infile.close();

    // Write data
    ofstream outfile("test2.img");
    vxor_ostream out(outfile, 288);
    out.write(data, 144 * 128);
    out.flush();
    outfile.close();

    return 0;
}
kayahr
  • 20,913
  • 29
  • 99
  • 147
  • I highly recommend avoiding iostreams. See http://stackoverflow.com/questions/2753060/who-architected-designed-cs-iostreams-and-would-it-still-be-considered-wel , http://accu.org/index.php/journals/1539 and http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml#Streams to learn some of the reasons why. – vitaut Dec 30 '12 at 04:10
  • @vitaut: If I understood the Google style guide correctly then they recommend using the old C-style I/O stuff? But I don't see how I can abstract I/O away from my classes then. My Image class just wants to read data and it doesn't want to care about the data source or if the data source is compressed or encrypted or whatever. With old C-style I/O I can pass a file handle to it but that's it. Doesn't sound like a good alternative. – kayahr Dec 30 '12 at 11:55
  • As suggested by DeadMG, you can work with iterators instead. Or you can create a simple interface (abstract class) that defines a few operations that you need, like read() that you've mentioned. Then you can have several implementations of your interface, e.g. one using C-style I/O, or mmap or whatever, even iostreams. – vitaut Dec 30 '12 at 14:18
  • Question: Would you pass in a standard stream like std::cout in as the streambuf argument of the constructor? – ijustlovemath Mar 06 '18 at 17:27
  • I think main() given in the Solution in the question has a minor but crucial bug. The ifstream and ofstream should be opened in binary mode: ``` int main() { // Read data ifstream infile("test.img", ios::binary); ... // Write data ofstream outfile("test2.img", ios::binary); ... } ``` Without this I found the reading of the file ended prematurely on Windows (I'd have added this as a comment but I don't yet have 50 reputation) – Nigel Sharp Oct 16 '19 at 11:21

6 Answers6

77

The proper way to create a new stream in C++ is to derive from std::streambuf and to override the underflow() operation for reading and the overflow() and sync() operations for writing. For your purpose you'd create a filtering stream buffer which takes another stream buffer (and possibly a stream from which the stream buffer can be extracted using rdbuf()) as argument and implements its own operations in terms of this stream buffer.

The basic outline of a stream buffer would be something like this:

class compressbuf
    : public std::streambuf {
    std::streambuf* sbuf_;
    char*           buffer_;
    // context for the compression
public:
    compressbuf(std::streambuf* sbuf)
        : sbuf_(sbuf), buffer_(new char[1024]) {
        // initialize compression context
    }
    ~compressbuf() { delete[] this->buffer_; }
    int underflow() {
        if (this->gptr() == this->egptr()) {
            // decompress data into buffer_, obtaining its own input from
            // this->sbuf_; if necessary resize buffer
            // the next statement assumes "size" characters were produced (if
            // no more characters are available, size == 0.
            this->setg(this->buffer_, this->buffer_, this->buffer_ + size);
        }
        return this->gptr() == this->egptr()
             ? std::char_traits<char>::eof()
             : std::char_traits<char>::to_int_type(*this->gptr());
    }
};

How underflow() looks exactly depends on the compression library being used. Most libraries I have used keep an internal buffer which needs to be filled and which retains the bytes which are not yet consumed. Typically, it is fairly easy to hook the decompression into underflow().

Once the stream buffer is created, you can just initialize an std::istream object with the stream buffer:

std::ifstream fin("some.file");
compressbuf   sbuf(fin.rdbuf());
std::istream  in(&sbuf);

If you are going to use the stream buffer frequently, you might want to encapsulate the object construction into a class, e.g., icompressstream. Doing so is a bit tricky because the base class std::ios is a virtual base and is the actual location where the stream buffer is stored. To construct the stream buffer before passing a pointer to a std::ios thus requires jumping through a few hoops: It requires the use of a virtual base class. Here is how this could look roughly:

struct compressstream_base {
    compressbuf sbuf_;
    compressstream_base(std::streambuf* sbuf): sbuf_(sbuf) {}
};
class icompressstream
    : virtual compressstream_base
    , public std::istream {
public:
    icompressstream(std::streambuf* sbuf)
        : compressstream_base(sbuf)
        , std::ios(&this->sbuf_)
        , std::istream(&this->sbuf_) {
    }
};

(I just typed this code without a simple way to test that it is reasonably correct; please expect typos but the overall approach should work as described)

David G
  • 94,763
  • 41
  • 167
  • 253
Dietmar Kühl
  • 150,225
  • 13
  • 225
  • 380
  • Can you give a small example how the usage of such a custom stream buffer would look like? I wonder how to use this stream buffer in the end because the image class needs an `istream` to read from. – kayahr Dec 29 '12 at 22:30
  • `istream` doesn't do any physical input; it delegates this to an `streambuf`, using the strategy pattern. The constructor of `istream` takes a `streambuf*` as argument. In the classical `istream` (`ifstream` and `istringstream`), this argument is supplied by the derived class, but there's nothing to stop you from instantiating a `istream` directly, with a pointer to a `streambuf` which you provide, or from deriving from `istream` so that the derived class' constructor can provide a `streambuf` of the desired type. – James Kanze Dec 29 '12 at 22:36
  • 3
    If you're thinking in terms of Java, `InputStream` is closer to `std::streambuf` than it is to `std::istream`. `std::istream` is more Java's `Format`, but with an interface which makes it much simpler to use. – James Kanze Dec 29 '12 at 22:37
  • @DietmarKühl: Your examples are very helpful, thanks! I've implemented a custom streambuf implementation following your instructions and it works. But I don't understand the workaround with the virtual base class you suggested for extending istream. I just extended `std::istream` and created my custom stream buffer in the constructor and passed it to the `init` method of `istream` and it works like a charm. Is there anything bad about this approach? – kayahr Dec 30 '12 at 15:05
  • @vitaut You don't have to derive a different `Format` class for each new type, then instantiate each one when outputting different types. You just use `<<`, and everything works. And of course, no one in their right mind ever writes `std::setiosflags(whatever)`. You define custom manipulators for logical markup. You define in one place what e.g. an interest rate should look like, and then you write `std::cout << interestRate << value;`. (As far as I know, only C++ and Java offer this possibility, but the Java version is far more awkward to use.) – James Kanze Dec 30 '12 at 15:21
  • @JamesKanze: Defining a different `Format` class is not an inherent feature of a formatting library, you can easily do without that even facilitating the legacy iostream's `operator<<`. I agree that you can address some of the formatting issues with manipulators, but not the inherent design flaws such as mixing formatting with I/O. Also I don't see how you can address the problems with i18n without using a proper formatting library. – vitaut Dec 30 '12 at 16:09
  • 1
    @kayahr: I like to initialize things just once. Using a null stream first and then calling `init()` is somewhat awkward. The problematic operation actually is destruction: Although it doesn't matter for input streams, it is crucially important for `std::ostream` the stream buffer is flushed by the destructor of `std::ostream`. If the stream buffer isn't the first virtual base it will already be destructed at that time. – Dietmar Kühl Dec 30 '12 at 16:22
  • @vitaut The whole point of iostream is that it _doesn't_ mix formatting with IO. The IO is performed using `streambuf`; the formatting using the `<<` operator on `ostream`. Two different classes, with completely different concerns. (With regards to quality I18n, neither Java nor C++ have a solution. You basically have to write a separate DLL/class for each locale. Human language is still several orders of magnitude more complex than programming languages.) – James Kanze Dec 30 '12 at 16:22
  • @JamesKanze: The only real short-coming with respect to i18n I'm aware of is the missing support for positional arguments: For some messages it is nessessary to reorder values to create a natural message. Otherwise I obviously disagree with the assessments of those complaining about IOstreams and all alternatives I have seen fail to provide at least one of the dimensions of customization. – Dietmar Kühl Dec 30 '12 at 16:29
  • @DietmarKühl The question is how far you want to go with regards to internationalization. If you're just outputting error messages, you can usually get away with things like: `"Cannot open: " << filename`, and at least in the languages I know, you can use similar constructs, e.g. `"Datei kann nicht geöffnet werden: " << filename`. The results may not be idiomatic, but they are acceptable. – James Kanze Dec 30 '12 at 16:33
  • @JamesKanze: That's in theory, but if you have a closer look at `streambuf`, you'll see that it a mess of input, output, buffering and even locale (!) with a horrible API, so it's more of an implementation detail of iostreams rather than a separate entity. – vitaut Dec 30 '12 at 16:34
  • @DietmarKühl For idiomatic use, on the other hand, positional arguments aren't sufficient: adjectives have to agree with the noun for number and case in German, for example, and it's highly unlikely that an English speaking programmer would have provided for this. Just handling plural and singular are non-trivial, if you think that Slovenian (and some other languages) have dual. In Arabic, verb forms also have gender; in Russia, numbers which end in a one (like twenty-one, or one hundred and one, but not eleven) take singular, and so on. Word order is just the tip of the iceberg. – James Kanze Dec 30 '12 at 16:36
  • @vitaut The naming conventions in `streambuf` could definitely be better, and I'd also like to see the code translation separated from the rest (but because C++ supports bidirectional streams with seeking, it's not practical). On the other hand, `streambuf` is definitely a separate class hierarchy from iostream. `iostream` uses it, but you derive separately from it, to customize the sink and source, where as you add a `<<` and a `>>` to format a new class. The functionality is exactly the same as that in Java, but it is an order of magnitude easier for client code to use. – James Kanze Dec 30 '12 at 16:39
  • @JamesKanze: It's true that positional arguments are not sufficient, but they are absolutely necessary for any serious i18n. Not only iostreams don't have these, but messages are often split which makes translation with different word order nearly impossible. Even printf without positional arguments is better in this respect. – vitaut Dec 30 '12 at 16:45
  • @DietmarKühl: Sorry, I still don't get it. Yes, I have to call the flush method on my custom stream before closing the underlying file stream. But this is also the case when I simply use an instance of ostream with my custom streambuf as constructor arg. I tried your hack with the virtual base class and also tried it by simply extending ostream without any hacks and both approaches work the same: I have to call flush (or destruct my custom stream) before closing the file stream. So I still don't see why this virtual base class hack is needed. – kayahr Dec 30 '12 at 22:50
  • 1
    @kayahr: `std::ostream::~ostream()` calls `this->flush()` which in turn calls `this->rdbuf()->pubsync()` accessing the stream buffer. Obviously, at this point the stream buffer shall not be destroyed. If your stream buffer is a simple base class, a `virtual` base class following `std::ostream`, or a member of the stream, it will be destroyed at the point in time when it is access. It may seem to work because your stream buffer probably doesn't really change its representation in the destructor but it is strictly undefined behavior. – Dietmar Kühl Dec 30 '12 at 22:58
  • @kayahr: Also, you shouldn't need to flush the underlying stream explicitly: When `pubsync()` is called on your stream buffer it is delegated to the `virtual` function `sync()` which you can override to flush your own buffer and then call `this->sbuf_->pubsync()` on the underlying stream bufffer. Since `std::ostream::~ostream()` calls `this->flush()` which calls `this->rdbuf()->pubsync()` (if the stream is in `good()` state) the stream buffers should automatically be flushed. – Dietmar Kühl Dec 30 '12 at 23:01
  • @DietmarKühl: Hm... Maybe this depends on the implementation? Using GNU stdc++ the flush() method of the stream isn't called when destroying the stream (standard std::ostream, no fancy custom stream class). So I'm calling sync() in the destructor of my string buffer myself to make sure it is flushed on deconstruction. – kayahr Dec 30 '12 at 23:41
  • Even if there are opinions that I shouldn't use iostreams at all I'm accepting this answer because it answered my explicit question (Which was about the standard C++ iostreams). Thanks for the detailed code examples, I learned a lot from them. – kayahr Dec 30 '12 at 23:49
  • @kayahr: I just checked the C++ 2011 standard and it, indeed, doesn't call `flush()` from the destructor of `std::ostream::~ostream()`. In fact, it has a remark stating that the destructor doesn't access `rdbuf()` (27.7.3.2 [ostream.cons] paragraph 4). I think this is a change compared to C++ 2003 but currently I can't easily search the issues list. Another bit of information I learned. With respect to not using IOStreams: I think these peopel are wrong... ;-) – Dietmar Kühl Dec 31 '12 at 00:04
  • @vitaut Positional parameters don't change anything if you're trying to generate idiomatic text in different languages. You need different logic for each language, which means a different DLL with different code for each language. – James Kanze Dec 31 '12 at 00:48
  • @JamesKanze: For most practical purposes GNU gettext and a proper formatting library (with full messages, not broken up like in iostreams) works pretty well. See http://www.gnu.org/savannah-checkouts/gnu/gettext/manual/html_node/Plural-forms.html for example. Releasing a different DLL for every language is just not realistic. – vitaut Dec 31 '12 at 03:29
  • @vitaut Only for people who don't understand multiple languages, or how languages work. I'm familiar with `gettext`, but I've never been able to make it handle things like gender (necessary in French and German, but with different rules), dual (necessart fir Arabic), and any number of other grammatical features. Realistic or not, you cannot produce idiomatic local text other than by using a different DLL for each locality. – James Kanze Dec 31 '12 at 09:19
  • @JamesKanze: I am telling you about a real tool that is successfully used in lots of projects and you are telling me about some theoretical issues. BTW English is not my native language, and I've been using programs translated with gettext in 2 languages which have much more complicated gender rules than English and the quality of translation was very good. No gender issues that you are talking about. The reason is that most substituted arguments are either numerals or some quoted text like filenames and such where gender is not relevant. Anyway, I don't see the point in further discussion. – vitaut Dec 31 '12 at 16:53
  • @vitaut You're telling me about a real tool that I've used and found to be deficient. I'm telling you about my real experience, trying to create idiomatic output in French and German. – James Kanze Dec 31 '12 at 17:47
7

boost (which you should have already if you're serious about C++), has a whole library dedicated to extending and customizing IO streams: boost.iostreams

In particular, it already has decompressing streams for a few popular formats (bzip2, gzlib, and zlib)

As you saw, extending streambuf may be an involving job, but the library makes it fairly easy to write your own filtering streambuf if you need one.

Cubbi
  • 46,567
  • 13
  • 103
  • 169
  • 6
    Yes, boost is definitely a topic I also have to learn. But I guess it is a good idea to first learn how to use (or hate) the standard C++ classes before I can really appreciate the benefits of the boost library. – kayahr Dec 30 '12 at 11:47
1

Don't, unless you want to die a terrible death of hideous design. IOstreams are the worst component of the Standard library - even worse than locales. The iterator model is much more useful, and you can convert from stream to iterator with istream_iterator.

Puppy
  • 144,682
  • 38
  • 256
  • 465
  • 8
    Actually, `iostream` is probably the best designed part of the standard library, although it suffers from a poor chose of names in some cases. Unlike the IO in most other languages, it manages to keep the major concepts (formatting/parsing vs. sinking/sourcing characters) well separated, and allows almost unlimited customization of each. (Java has the same separation, but they managed to make it much more complicated to provide formatting for user defined types, and much more difficult to use.) And as far as I know, C++ is the only language whose IO supports logical markup. – James Kanze Dec 29 '12 at 22:43
  • 4
    The I/O library should not even go anywhere near formatting or parsing, or sinking and sourcing. The correct way to approach sinking/sourcing is by using the existing algorithm abstraction- iterators. And I, personally, am currently working on an I/O proposal- although it would sure be easier with ranges. – Puppy Dec 29 '12 at 22:46
  • 2
    Manipulators. Stream state. Performance. Well designed? I'm sure the truth is in the middle. (Also, I'm afraid there are many languages that can do just about the same, or better. You could just start with the languages with syntactic macro facilities, I guess.) – sehe Dec 29 '12 at 22:47
  • 2
    @sehe Yeah, manipulators are silly - why is octal number output a _stream property_? Because of that, every non-standard IO operation has to revert them to default, and then set it to previous value. I won't even comment on exception guarantees and multithreading capabilities of such solution. – milleniumbug Dec 29 '12 at 23:15
  • +James Kanze: Have a look at the answers here: http://stackoverflow.com/questions/2753060/who-architected-designed-cs-iostreams-and-would-it-still-be-considered-wel . Hope they will convince you that IOStreams is poorly design by modern standards. – vitaut Dec 30 '12 at 03:50
0

It is probably possible to do this, but I feel that it's not the "right" usage of this feature in C++. The iostream >> and << operators are meant for fairly simple operations, such as wriitng the "name, street, town, postal code" of a class Person, not for parsing and loading images. That's much better done using the stream::read() - using Image(astream);, and you may implement a stream for compression, as descrtibed by Dietmar.

Mats Petersson
  • 126,704
  • 14
  • 140
  • 227
  • 1
    Actually the `read()` method of the stream is used inside the extraction operator to read the image data. But in the end it doesn't really matter if the stream is passed to the constructor, an operator or a method. The point is how to create such a custom stream. – kayahr Dec 29 '12 at 22:34
  • Well, the real difference in my proposal is that you are not adding another `operator >>` for no real use [yes, it looks neat, but when you have 3-4-5 diferent image formats to support, it will get VERY messy]. You could even hide the stream inside the image class. – Mats Petersson Dec 29 '12 at 22:37
  • 1
    `operator>>` is probably not the answer for an image _file_. In fact, the iostream idiom is probably not appropriate for large, structured binary data; you need something else. On the other hand, if you have new, user defined types, which parse text data, `operator>>` works very well. In my experience, in most applications, almost all `>>` will be to user defined types. – James Kanze Dec 29 '12 at 22:46
0

I agree with @DeadMG and wouldn't recommend using iostreams. Apart from poor design the performance is often worse than that of plain old C-style I/O. I wouldn't stick to a particular I/O library though, instead, I'd create an interface (abstract class) that has all required operations, for example:

class Input {
 public:
  virtual void read(char *buffer, size_t size) = 0;
  // ...
};

Then you can implement this interface for C I/O, iostreams, mmap or whatever.

vitaut
  • 49,672
  • 25
  • 199
  • 336
0

After reading several STL references and example solutions for certain use cases, I'm still missing a didactic answer, which is quite simple:

  1. Method std::streambuf::underflow() is called, when the std::istream instance has consumed preceding read buffer content.
  2. This method is to be overwritten in a subclass of std::streambuf to delegete the read buffer request to the specific underlying source stream.
  3. The post condition of the underflow method is either ´eof´, when the source stream ends
    OR a subsequent buffer window of valid input data is defined by setg(buffer-begin, buffer-begin, buffer-end) and it's first character returned.
Sam Ginrich
  • 661
  • 6
  • 7