I figured that for a random passer-by the Boost Spirit answer might seem complex/overkill.
I wanted to try again, since it's 2021, whether with C++17, we can do a reasonable job with just the standard library.
Turns out it's a lot more work. The Qi implementation takes 86 lines of code, but the standard library implementation takes 136 lines. Also, it took me a lot longer (several hours) to debug/write. In particular it was hard to get '='
, '['
, ']'
as token boundaries with std::istream&
. I used the ctype
facet approach from this answer: How do I iterate over cin line by line in C++?
I did leave in the DebugPeeker
(20 lines) so you can perhaps understand it yourself.
Short Expo
The top level parse function looks sane and shows what I wanted to achieve: natural std::istream
extraction:
static Ast::File std_parse_game(std::string_view input) {
std::istringstream iss{std::string(input)};
using namespace Helpers;
if (Ast::File parsed; iss >> parsed)
return parsed;
throw std::runtime_error("Unable to parse game");
}
All the rest lives in namespace Helpers
:
static inline std::istream& operator>>(std::istream& is, Ast::File& v) {
for (section s; is >> s;) {
if (s.name == "parameters")
is >> v.parameters;
else if (s.name == "initialization")
is >> v.initializations.emplace_back();
else if (s.name == "color")
is >> v.colors.emplace_back();
else
is.setstate(std::ios::failbit);
}
if (is.eof())
is.clear();
return is;
}
So far, this is paying out well. The different section types are similar:
static inline std::istream& operator>>(std::istream& is, Ast::Parameters& v) {
return is
>> entry{"numColors", v.numColors}
>> entry{"boardSize", v.boardSize}
>> entry{"numSnails", v.numSnails};
}
static inline std::istream& operator>>(std::istream& is, Ast::Initialization& v) {
return is
>> entry{"id", v.id}
>> entry{"row", v.row}
>> entry{"col", v.col}
>> entry{"orientation", v.orientation};
}
static inline std::istream& operator>>(std::istream& is, Ast::Color& v) {
return is
>> entry{"id", v.id}
>> entry{"nextColor", v.nextColor}
>> entry{"deltaOrientation", v.deltaOrientation};
}
Now, if all was plain sailing like this, I would not be recommending Spirit. Now we get into conditional parsings.
The entry{"name", value}
formulations use a "manipulator type":
template <typename T> struct entry {
entry(std::string name, T& into) : _name(name), _into(into) {}
std::string _name;
T& _into;
friend std::istream& operator>>(std::istream& is, entry e) {
return is >> expect{e._name} >> expect{'='} >> e._into;
}
};
Similarly, sections are using expect
and token
:
struct section {
std::string name;
friend std::istream& operator>>(std::istream& is, section& s) {
if (is >> expect('['))
return is >> token{s.name} >> expect{']'};
return is;
}
};
The conditional is important to be able to detect EOF without putting the stream into a hard failed mode (is.bad()
!= is.fail()
).
expect
is built on top of token
:
template <typename T> struct expect {
expect(T expected) : _expected(expected) {}
T _expected;
friend std::istream& operator>>(std::istream& is, expect const& e) {
if (T actual; is >> token{actual})
if (actual != e._expected)
is.setstate(std::ios::failbit);
return is;
}
};
You will notice that there's much less error information. We just make the
stream fail()
in case an expected token is not found.
Here's where the real complexity comes. I didn't want to parse character by
character. But reading std::string
using operator>>
would only stop on
whitespace, meaning that a section name would "eat" the bracket: parameters]
instead of parameters
, and keys might eat the =
character if there was no
separating space.
In the answer linked above we learn how to build our own character
classification locale facet:
// make sure =,[,] break tokens
struct mytoken_ctype : std::ctype<char> {
static auto const* get_table() {
static std::vector rc(table_size, std::ctype_base::mask());
rc[' '] = rc['\f'] = rc['\v'] = rc['\t'] = rc['\r'] = rc['\n'] =
std::ctype_base::space;
// crucial for us:
rc['='] = rc['['] = rc[']'] = std::ctype_base::space;
return rc.data();
}
mytoken_ctype() : std::ctype<char>(get_table()) {}
};
And then we need to use it but only if we parse a std::string
token. That
way, if we expect('=')
it will not skip over '='
because our facet calls it whitespace...
template <typename T> struct token {
token(T& into) : _into(into) {}
T& _into;
friend std::istream& operator>>(std::istream& is, token const& t) {
std::locale loc = is.getloc();
if constexpr (std::is_same_v<std::decay_t<T>, std::string>) {
loc = is.imbue(std::locale(std::locale(), new mytoken_ctype()));
}
try { is >> t._into; is.imbue(loc); }
catch (...) { is.imbue(loc); throw; }
return is;
}
};
I tried to keep it pretty condensed. Had I used proper formatting, we would
have had more lines of code still :)
DEMO AND TEST
I used the same Ast types, so it made sense to test both implementations and
compare the results for equality.
NOTES:
- On Compiler Explorer so we can enjoy
libfmt
for easy output
- For comparison I used one C++20 feature to get compiler generated
operator==
Live On Compiler Explorer
#include <boost/spirit/home/qi.hpp>
#include <boost/fusion/include/io.hpp>
#include <fstream>
#include <sstream>
#include <iomanip>
#include <fmt/ranges.h>
#include <fmt/ostream.h>
namespace qi = boost::spirit::qi;
namespace Ast {
using Id = unsigned;
using Size = uint16_t; // avoiding char types for easy debug/output
using Coord = Size;
using ColorNumber = Size;
using Orientation = Size;
using Delta = signed;
struct Parameters {
Size numColors{}, boardSize{}, numSnails{};
bool operator==(Parameters const&) const = default;
};
struct Initialization {
Id id;
Coord row;
Coord col;
Orientation orientation;
bool operator==(Initialization const&) const = default;
};
struct Color {
Id id;
ColorNumber nextColor;
Delta deltaOrientation;
bool operator==(Color const&) const = default;
};
struct File {
Parameters parameters;
std::vector<Initialization> initializations;
std::vector<Color> colors;
bool operator==(File const&) const = default;
};
using boost::fusion::operator<<;
template <typename T>
static inline std::ostream& operator<<(std::ostream& os, std::vector<T> const& v) {
return os << fmt::format("vector<{}>{}",
boost::core::demangle(typeid(T).name()), v);
}
} // namespace Ast
BOOST_FUSION_ADAPT_STRUCT(Ast::Parameters, numColors, boardSize, numSnails)
BOOST_FUSION_ADAPT_STRUCT(Ast::Initialization, id, row, col, orientation)
BOOST_FUSION_ADAPT_STRUCT(Ast::Color, id, nextColor, deltaOrientation)
BOOST_FUSION_ADAPT_STRUCT(Ast::File, parameters, initializations, colors)
template <typename It>
struct GameParser : qi::grammar<It, Ast::File()> {
GameParser() : GameParser::base_type(start) {
using namespace qi;
start = skip(blank)[file];
auto section = [](const std::string& name) {
return copy('[' >> lexeme[lit(name)] >> ']' >> (+eol | eoi));
};
auto required = [](const std::string& name, auto value) {
return copy(lexeme[eps > lit(name)] > '=' > value >
(+eol | eoi));
};
file = parameters >
*initialization >
*color >
eoi; // must reach end of input
parameters = section("parameters") >
required("numColors", _size) >
required("boardSize", _size) >
required("numSnails", _size);
initialization = section("initialization") >
required("id", _id) >
required("row", _coord) >
required("col", _coord) >
required("orientation", _orientation);
color = section("color") >
required("id", _id) >
required("nextColor", _colorNumber) >
required("deltaOrientation", _delta);
BOOST_SPIRIT_DEBUG_NODES((file)(parameters)(initialization)(color))
}
private:
using Skipper = qi::blank_type;
qi::rule<It, Ast::File()> start;
qi::rule<It, Ast::File(), Skipper> file;
// sections
qi::rule<It, Ast::Parameters(), Skipper> parameters;
qi::rule<It, Ast::Initialization(), Skipper> initialization;
qi::rule<It, Ast::Color(), Skipper> color;
// value types
qi::uint_parser<Ast::Id> _id;
qi::uint_parser<Ast::Size> _size;
qi::uint_parser<Ast::Coord> _coord;
qi::uint_parser<Ast::ColorNumber> _colorNumber;
qi::uint_parser<Ast::Orientation> _orientation;
qi::int_parser<Ast::Delta> _delta;
};
static Ast::File qi_parse_game(std::string_view input) {
using SVI = std::string_view::const_iterator;
static const GameParser<SVI> parser{};
try {
Ast::File parsed;
if (qi::parse(input.begin(), input.end(), parser, parsed)) {
return parsed;
}
throw std::runtime_error("Unable to parse game");
} catch (qi::expectation_failure<SVI> const& ef) {
std::ostringstream oss;
auto where = ef.first - input.begin();
auto sol = 1 + input.find_last_of("\r\n", where);
auto lineno = 1 + std::count(input.begin(), input.begin() + sol, '\n');
auto col = 1 + where - sol;
auto llen = input.substr(sol).find_first_of("\r\n");
oss << "input.txt:" << lineno << ":" << col << " Expected: " << ef.what_ << "\n"
<< " note: " << input.substr(sol, llen) << "\n"
<< " note:" << std::setw(col) << "" << "^--- here";
throw std::runtime_error(oss.str());
}
}
namespace Helpers {
struct DebugPeeker {
DebugPeeker(std::istream& is, int line) : is(is), line(line) { dopeek(); }
~DebugPeeker() { dopeek(); }
private:
std::istream& is;
int line;
void dopeek() const {
std::char_traits<char> t;
auto ch = is.peek();
std::cerr << "DEBUG " << line << " Peek: ";
if (std::isgraph(ch))
std::cerr << "'" << t.to_char_type(ch) << "'";
else
std::cerr << "<" << ch << ">";
std::cerr << " " << std::boolalpha << is.good() << "\n";
}
};
#define DEBUG_PEEK(is) // Peeker _peek##__LINE__(is, __LINE__);
// make sure =,[,] break tokens
struct mytoken_ctype : std::ctype<char> {
static auto const* get_table() {
static std::vector rc(table_size, std::ctype_base::mask());
rc[' '] = rc['\f'] = rc['\v'] = rc['\t'] = rc['\r'] = rc['\n'] =
std::ctype_base::space;
// crucial for us:
rc['='] = rc['['] = rc[']'] = std::ctype_base::space;
return rc.data();
}
mytoken_ctype() : std::ctype<char>(get_table()) {}
};
template <typename T> struct token {
token(T& into) : _into(into) {}
T& _into;
friend std::istream& operator>>(std::istream& is, token const& t) {
DEBUG_PEEK(is);
std::locale loc = is.getloc();
if constexpr (std::is_same_v<std::decay_t<T>, std::string>) {
loc = is.imbue(std::locale(std::locale(), new mytoken_ctype()));
}
try { is >> t._into; is.imbue(loc); }
catch (...) { is.imbue(loc); throw; }
return is;
}
};
template <typename T> struct expect {
expect(T expected) : _expected(expected) {}
T _expected;
friend std::istream& operator>>(std::istream& is, expect const& e) {
DEBUG_PEEK(is);
if (T actual; is >> token{actual})
if (actual != e._expected)
is.setstate(std::ios::failbit);
return is;
}
};
template <typename T> struct entry {
entry(std::string name, T& into) : _name(name), _into(into) {}
std::string _name;
T& _into;
friend std::istream& operator>>(std::istream& is, entry e) {
DEBUG_PEEK(is);
return is >> expect{e._name} >> expect{'='} >> e._into;
}
};
struct section {
std::string name;
friend std::istream& operator>>(std::istream& is, section& s) {
DEBUG_PEEK(is);
if (is >> expect('['))
return is >> token{s.name} >> expect{']'};
return is;
}
};
static inline std::istream& operator>>(std::istream& is, Ast::Parameters& v) {
DEBUG_PEEK(is);
return is
>> entry{"numColors", v.numColors}
>> entry{"boardSize", v.boardSize}
>> entry{"numSnails", v.numSnails};
}
static inline std::istream& operator>>(std::istream& is, Ast::Initialization& v) {
DEBUG_PEEK(is);
return is
>> entry{"id", v.id}
>> entry{"row", v.row}
>> entry{"col", v.col}
>> entry{"orientation", v.orientation};
}
static inline std::istream& operator>>(std::istream& is, Ast::Color& v) {
DEBUG_PEEK(is);
return is
>> entry{"id", v.id}
>> entry{"nextColor", v.nextColor}
>> entry{"deltaOrientation", v.deltaOrientation};
}
static inline std::istream& operator>>(std::istream& is, Ast::File& v) {
DEBUG_PEEK(is);
for (section s; is >> s;) {
if (s.name == "parameters")
is >> v.parameters;
else if (s.name == "initialization")
is >> v.initializations.emplace_back();
else if (s.name == "color")
is >> v.colors.emplace_back();
else
is.setstate(std::ios::failbit);
}
if (is.eof())
is.clear();
return is;
}
}
static Ast::File std_parse_game(std::string_view input) {
std::istringstream iss{std::string(input)};
using namespace Helpers;
if (Ast::File parsed; iss >> parsed)
return parsed;
throw std::runtime_error("Unable to parse game");
}
std::string read_file(const std::string& name) {
std::ifstream ifs(name);
return std::string(std::istreambuf_iterator<char>(ifs), {});
}
int main() {
std::string const game_save = read_file("input.txt");
Ast::File g1, g2;
try {
std::cout << "Qi: " << (g1 = qi_parse_game(game_save)) << "\n";
} catch (std::exception const& e) { std::cerr << e.what() << "\n"; }
try {
std::cout << "std: " << (g2 = std_parse_game(game_save)) << "\n";
} catch (std::exception const& e) { std::cerr << e.what() << "\n"; }
std::cout << "Equal: " << std::boolalpha << (g1 == g2) << "\n";
}
To my great relief, the parsers agree on the data:
Qi: ((4 11 2)
vector<Ast::Initialization>{(0 3 4 0), (1 5 0 1)}
vector<Ast::Color>{(0 1 2), (1 2 1), (2 3 -2), (3 0 -1)})
std: ((4 11 2)
vector<Ast::Initialization>{(0 3 4 0), (1 5 0 1)}
vector<Ast::Color>{(0 1 2), (1 2 1), (2 3 -2), (3 0 -1)})
Equal: true
Summary/Conclusion
Although this answer is "standard" and "portable" it has some drawbacks.
For example it is certainly not easier to get right, it has virtually no debug options or error reporting, it doesn't validate the input format as much. E.g. it will still read this unholy mess and accept it:
[parameters] numColors=999 boardSize=999 numSnails=999
[color] id=0 nextColor=1 deltaOrientation=+2 [color] id=1 nextColor=2
deltaOrientation=+1 [
initialization] id=1 row=5 col=0 orientation=1
[color] id=2 nextColor=3 deltaOrientation=-2
[parameters] numColors=4 boardSize=11 numSnails=2
[color] id=3 nextColor=0 deltaOrientation=-1
[initialization] id=0 row=3 col=4 orientation=0
If your input format is not stable and computer-written, I would highly recommend against the standard-library approach because it will lead to hard-to-diagnose problems and just horrible UX (don't make your users want to throw their computer out of the window because of unhelpful error messages like "Unable to parse game data").
Otherwise, you might. For one thing, it'll be faster to compile.