C++ binary files and iterators: getting away with a 1:1 using ifstreambuf_iterator?

Question

This answer points out the fact that C++ is not well suited for the iteration over a binary file, but this is what I need right now, in short I need to operate on files in a "binary" way, yes all files are binary even the .txt ones, but I'm writing something that operates on image files, so I need to read files that are well structured, were the data is arranged in a specific way.

I would like to read the entire file in a data structure such as std::vector<T> so I can almost immediately close the file and work with the content in memory without caring about disk I/O anymore.

Right now, the best way to perform a complete iteration over a file according to the standard library is something along the lines of

std::ifstream ifs(filename, std::ios::binary);
  for (std::istreambuf_iterator<char, std::char_traits<char> > it(ifs.rdbuf());
       it != std::istreambuf_iterator<char, std::char_traits<char> >(); it++) {
    // do something with *it;
  }
ifs.close();

or use std::copy, but even with std::copy you are always using istreambuf iterators ( so if I understand the C++ documentation correctly, you are basically reading 1 byte at each call with the previous code ).

So the question is: how do I write a custom iterator ? from where I should inherit from ?

I assume that this is also important while writing a file to disk, and I assume that I could use the same iterator class for writing, if I'm wrong please feel free to correct me.

Is the *size* of the inbound data precluding you from just [`ifs.read`](http://en.cppreference.com/w/cpp/io/basic_istream/read)-ing the data straight up into a `std::vector` and iterating over that? — WhozCraig, Nov 21 '13 at 17:58
@WhozCraig for now I don't think that the file are too big to be kept in memory ( if this is what you are referring to ), I'm fine with `read` or any other way, even the constructor of the `vector` class supports iterators, so I'm fine on that side, the "problem" are the iterators themself, I would like to write one to try to browse the data differently. EDIT: I would like to avoid any C-ish way, I'll stick with the iterators. — user2485710, Nov 21 '13 at 18:07
*you are basically reading 1 byte at each call* -- from `ifstream`'s in-memory buffer, not from the file itself. The actual read(2) calls are still for every 4k or 16k or whatever is the default buffer for you. — Cubbi, Nov 21 '13 at 18:11
@Cubbi yes, I wasn't going to introduce the buffered/unbuffered behaviour because I want to keep the focus on the iterators, but you are right, anyway I'm also not interested on this because is something platform-specific and I'm also trying to adopt a solution that is cross-platform as much as possible, without introducing extra stuff. That's why I would like to re-write an iterator, looks like it's the perfect mix between abstraction from the file and portability. — user2485710, Nov 21 '13 at 18:16

Dietmar Kühl · Accepted Answer · 2013-11-21T18:43:10.290

It is possible to optimize std::copy() using std::istreambuf_iterator<char> but hardly any implementation does. Just deriving from something won't really do the trick either because that isn't how iterators work.

The most effective built-in approach is probably to simply dump the file into an std::ostringstream and the get a std::string from there:

std::ostringstream out;
out << file.rdbuf();
std::string content = out.str();

If you want to avoid travelling through a std::string you could write a stream buffer directly dumping the content into a memory area or a std::vector<unsigned char> and also using the output operation above.

The std::istreambuf_iterator<char>s could, in principle have a backdoor to the stream buffer's and bypass characterwise operations. Without that backdoor you won't be able to speed up anything using these iterators. You could create an iterator on top of stream buffers using the stream buffer's sgetn() to deal with a similar buffer. In that case you'd pretty much need a version of std::copy() dealing with segments (i.e., each fill of a buffer) efficiently. Short of either I'd just read the file into buffer using a stream buffer and iterate over that.

so you are suggesting to basically stick with my first implementation ? What are the possible errors ? What happens if the file is corrupted ? — user2485710, Nov 22 '13 at 11:23

score 2 · Answer 2 · answered Nov 21 '13 at 20:12

My suggestion is not to use a custom stream, stream-buffer or stream-iterator.

#include <fstream>

struct Data {
    short a;
    short b;
    int   c;
};

std::istream& operator >> (std::istream& stream, Data& data) {
    static_assert(sizeof(Data) == 2*sizeof(short) + sizeof(int), "Invalid Alignment");
    if(stream.read(reinterpret_cast<char*>(&data), sizeof(Data))) {
        // Consider endian
    }
    else {
        // Error
    }
    return stream;
}

int main(int argc, char* argv[])
{
    std::ifstream stream;
    Data data;
    while(stream >> data) {
        // Process
    }
    if(stream.fail()) {
        // Error (EOF is good)
    }
    return 0;
}

You could dare to make a stream buffer iterator reading elements having a bigger size than the underlaying char_type:

What if the data has an invalid format ?
What if the data is incomplete and at EOF ?

The state of the stream is not maintained by the buffer or iterator.

@user2485710 That would depend on the underlaying stream buffer (hence it is possible) — , Nov 22 '13 at 17:45

C++ binary files and iterators: getting away with a 1:1 using ifstreambuf_iterator?

2 Answers2

Linked