Personally, I think these are reasonable questions and I remember very well that I struggled with them myself. So here we go:
Where is my mistake here ?
I wouldn't call it a mistake but you probably want to make sure you don't have to back off from what you have read. That is, I would implement three versions of the input functions. Depending on how complex the decoding of a specific type is I might not even share the code because it might be just a small piece anyway. If it is more than a line or two probably would share the code. That is, in your example I would have an extractor for FooBar
which essentially reads the Foo
or the Bar
members and initializes objects correspondingly. Alternatively, I would read the leading part and then call a shared implementation extracting the common data.
Let's do this exercise because there are a few things which may be a complication. From your description of the format it isn't clear to me if the "string" and what follows the string are delimited e.g. by a whitespace (space, tab, etc.). If not, you can't just read a std::string
: the default behavior for them is to read until the next whitespace. There are ways to tweak the stream into considering characters as whitespace (using std::ctype<char>
) but I'll just assume that there is space. In this case, the extractor for Foo
could look like this (note, all code is entirely untested):
std::istream& read_data(std::istream& is, Foo& foo, std::string& s) {
Foo tmp(s);
if (is >> get_char<'('> >> tmp.m_x >> get_char<','> >> tmp.m_y >> get_char<')'>)
std::swap(tmp, foo);
return is;
}
std::istream& operator>>(std::istream& is, Foo& foo)
{
std::string s;
return read_data(is >> s, foo, s);
}
The idea is that read_data()
read the part of a Foo
which is different from Bar
when reading a FooBar
. A similar approach would be used for Bar
but I omit this. The more interesting bit is the use of this funny get_char()
function template. This is something called a manipulator and is just a function taking a stream reference as argument and returning a stream reference. Since we have different characters we want to read and compare against, I made it a template but you can have one function per character as well. I'm just too lazy to type it out:
template <char Expect>
std::istream& get_char(std::istream& in) {
char c;
if (in >> c && c != 'e') {
in.set_state(std::ios_base::failbit);
}
return in;
}
What looks a bit weird about my code is that there are few checks if things worked. That is because the stream would just set std::ios_base::failbit
when reading a member failed and I don't really have to bother myself. The only case where there is actually special logic added is in get_char()
to deal with expecting a specific character. Similarly there is no skipping of whitespace characters (i.e. use of std::ws
) going on: all the input functions are formatted input
functions and these skip whitespace by default (you can turn this off by using e.g. in >> std::noskipws
) but then lots of things won't work.
With a similar implementation for reading a Bar
, reading a FooBar
would look something like this:
std::istream& operator>> (std::istream& in, FooBar& foobar) {
std::string s;
if (in >> s) {
switch ((in >> std::ws).peek()) {
case '(': { Foo foo; read_data(in, foo, s); foobar = foo; break; }
case '[': { Bar bar; read_data(in, bar, s); foobar = bar; break; }
default: in.set_state(std::ios_base::failbit);
}
}
return in;
}
This code uses an unformatted input function, peek()
which just looks at the next character. It either return the next character or it returns std::char_traits<char>::eof()
if it fails. So, if there is either an opening parenthesis or an opening bracket we have read_data()
take over. Otherwise we always fail. Solved the immediate problem. On to distributing information...
Should one write his calls to operator>> to leave the initial data still available after a failure ?
The general answer is: no. If you failed to read something went wrong and you give up. This might mean that you need to work harder to avoid failing, though. If you really need to back off from the position you were at to parse your data, you might want to read data first into a std::string
using std::getline()
and then analyze this string. Use of std::getline()
assumes that there is a distinct character to stop at. The default is newline (hence the name) but you can use other characters as well:
std::getline(in, str, '!');
This would stop at the next exclamation mark and store all characters up to it in str
. It would also extract the termination character but it wouldn't store it. This makes it interesting sometimes when you read the last line of a file which may not have a newline: std::getline()
succeeds if it can read at least one character. If you need to know if the last character in a file is a newline, you can test if the stream reached:
if (std::getline(in, str) && in.eof()) { std::cout << "file not ending in newline\"; }
If so, how can I do that efficiently ?
Streams are by their very nature single pass: you receive each character just once and if you skip over one you consume it. Thus, you typically want to structure your data in a way such that you don't have to backtrack. That said, this isn't always possible and most streams actually have a buffer under the hood two which characters can be returned. Since streams can be implemented by a user there is no guarantee that characters can be returned. Even for the standard streams there isn't really a guarantee.
If you want to return a character, you have to put back exactly the character you extracted:
char c;
if (in >> c && c != 'a')
in.putback(c);
if (in >> c && c != 'b')
in.unget();
The latter function has slightly better performance because it doesn't have to check that the character is indeed the one which was extracted. It also has less chances to fail. Theoretically, you can put back as many characters as you want but most streams won't support more than a few in all cases: if there is a buffer, the standard library takes care of "ungetting" all characters until the start of the buffer is reached. If another character is returned, it calls the virtual function std::streambuf::pbackfail()
which may or may not make more buffer space available. In the stream buffers I have implemented it will typically just fail, i.e. I typically don't override this function.
If not, is there a way to "store" (and restore) the complete status of an input stream: state and data ?
If you mean to entirely restore the state you were at, including the characters, the answer is: sure there is. ...but no easy way. For example, you could implement a filtering stream buffer and put back characters as described above to restore the sequence to be read (or support seeking or explicitly setting a mark in the stream). For some streams you can use seeking but not all streams support this. For example, std::cin
typically doesn't support seeking.
Restoring the characters is only half the story, though. The other stuff you want to restore are the state flags and any formatting data. In fact, if the stream went into a failed or even bad state you need to clear the state flags before the stream will do most operations (although I think the formatting stuff can be reset anyway):
std::istream fmt(0); // doesn't have a default constructor: create an invalid stream
fmt.copyfmt(in); // safe the current format settings
// use in
in.copyfmt(fmt); // restore the original format settings
The function copyfmt()
copies all fields associated with the stream which are related to formatting. These are:
- the locale
- the fmtflags
- the information storage iword() and pword()
- the stream's events
- the exceptions
- the streams's state
If you don't know about most of them don't worry: most stuff you probably won't care about. Well, until you need it but by then you have hopefully acquired some documentation and read about it (or ask and got a good response).
What differences are they between failbit and badbit ? When should we use one or the other ?
Finally a short and simple one:
failbit
is set when formatting errors are detected, e.g. a number is expected but the character 'T' is found.
badbit
is set when something goes wrong in the stream's infrastructure. For example, when the stream buffer isn't set (as in the stream fmt
above) the stream has std::badbit
set. The other reason is if an exception is thrown (and caught by way of the the exceptions()
mask; by default all exceptions are caught).
Is there any online reference (or a book) that explains deeply how to deal with iostreams ? not just the basic stuff: the complete error handling.
Ah, yes, glad you asked. You probably want to get Nicolai Josuttis's "The C++ Standard Library". I know that this book describes all the details because I contributed to writing it. If you really want to know everything about IOStreams and locales you want Angelika Langer & Klaus Kreft's "IOStreams and Locales". In case you wonder where I got the information from originally: this was Steve Teale's "IOStreams" I don't know if this book is still in print and it lacking a lot of the stuff which was introduced during standardization. Since I implemented my own version of IOStreams (and locales) I know about the extensions as well, though.