4

I am writing a parser for a type of input file. The input file looks something like:

[CalculationBlock]
CalculationTitle="Test Parser Input System" , MatchingRadius=25.0, StepSize=0.01,ProblemType=RelSchroedingerEqn
MaxPartialWaveJ=800, SMatConv=10E-8
PartialWaveConv= 10E-8, SmallValueLimit = 10E-8
PotentialRadType=HeavyIon
[end]

Essentially it is divided into blocks that start with [BlockName] and then have a set of named parameters within. The named parameters can be separated by ',' or '\n' characters.

Using the incomplete input file I gave above, I wanted to write a parser for it that would serve as a jumping off point for a more complete input file. I did so but the parser has a weakness that I am not sure how to address. It is not parameter order independent. For example, if a user were to put the parameter PartialWaveConv= 10E-8 before SMatConv=10E-8 it would fail.

I briefly contemplated enumerating each possible order of parameters in a block but I discarded it since there are n! permutations of n parameter value pairs. So my question is: Is there any way to make the parser independent of parameter ordering?

The toy parser I wrote is below, I apologize if it is amateurish, this is my first foray into boost, let alone boost.spirit.

#include<string>
#include<iostream>
#include<cstdlib>
#include<fstream>
#include<boost/config/warning_disable.hpp>
#include<boost/spirit/include/qi.hpp>
#include<boost/spirit/include/phoenix_core.hpp>
#include<boost/spirit/include/phoenix_operator.hpp>
#include<boost/spirit/include/phoenix_object.hpp>
#include<boost/fusion/include/adapt_struct.hpp>
#include<boost/fusion/include/io.hpp>
#include<boost/spirit/include/support_istream_iterator.hpp>

namespace blocks
{
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
namespace phoenix = boost::phoenix;

struct CalcBlock
{
    std::string calculationTitle;
    float matchingRad;
    float stepSize;
    std::string problemType;
    int maxPartialWaveJ;
    float sMatrixConvergenceValue;
    float partialWaveConvergenceValue;
    float smallValueLimit;
    std::string potentialRadType;
};

}

//tell fusion about the block structure
BOOST_FUSION_ADAPT_STRUCT(blocks::CalcBlock,
                        (std::string, calculationTitle)
                        (float, matchingRad)
                        (float, stepSize)
                        (std::string, problemType)
                        (int, maxPartialWaveJ)
                        (float, sMatrixConvergenceValue)
                        (float, partialWaveConvergenceValue)
                        (float, smallValueLimit)
                        (std::string, potentialRadType)
)

namespace blocks
{

template <typename Iterator>
struct CalcBlockParser : qi::grammar<Iterator, CalcBlock(), boost::spirit::ascii::blank_type>
{
    CalcBlockParser() : CalcBlockParser::base_type(start)
    {
        using qi::int_;
        using qi::lit;
        using qi::float_;
        using qi::lexeme;
        using ascii::char_;

        quotedString %= lexeme['"' >> +(char_ - '"' - '\n') >> '"'];
        plainString %= lexeme[ +(char_ - ' ' - ',' - '\n') ];

        start %=
            lit("[CalculationBlock]") >> '\n'
            >> lit("CalculationTitle") >> '=' >> quotedString >> (lit(',') | lit('\n'))
            >> lit("MatchingRadius") >> '=' >> float_ >> (lit(',') | lit('\n'))
            >> lit("StepSize") >> '=' >> float_ >> (lit(',') | lit('\n'))
            >> lit("ProblemType") >> '=' >> plainString >> (lit(',') | lit('\n'))
            >> lit("MaxPartialWaveJ") >> '=' >> int_ >> (lit(',') | lit('\n'))
            >> lit("SMatConv") >> '=' >> float_ >> (lit(',') | lit('\n'))
            >> lit("PartialWaveConv") >> '=' >> float_ >> (lit(',') | lit('\n'))
            >> lit("SmallValueLimit") >> '=' >> float_ >> (lit(',') | lit('\n'))
            >> lit("PotentialRadType") >> '=' >> plainString
            >> lit("\n[end]\n");
    }

    qi::rule<Iterator, std::string(), boost::spirit::ascii::blank_type> quotedString;
    qi::rule<Iterator, std::string(), boost::spirit::ascii::blank_type> plainString;
    qi::rule<Iterator, CalcBlock(), boost::spirit::ascii::blank_type> start;
};

}

using std::cout;
using std::endl;
namespace spirit = boost::spirit;
int main(int argc, char *argv[])
{
    if (argc != 2)
    {
        cout << "\nUsage:\n\t./echos InputFileName\n" << endl;
        return EXIT_FAILURE;
    }

    std::string inputFileName(argv[1]);
    cout << "Reading input from the file: " << inputFileName << endl;
    std::ifstream input(inputFileName);
    input.unsetf(std::ios::skipws);

    spirit::istream_iterator start(input);
    spirit::istream_iterator stop;

    typedef blocks::CalcBlockParser<spirit::istream_iterator> CalcBlockParser;

    CalcBlockParser cbParser;

    blocks::CalcBlock cb;

    bool success = phrase_parse(start, stop, cbParser, boost::spirit::ascii::blank, cb);

    if (success && start == stop)
    {
        std::cout << boost::fusion::tuple_open('[');
        std::cout << boost::fusion::tuple_close(']');
        std::cout << boost::fusion::tuple_delimiter(", ");

        std::cout << "-------------------------\n";
        std::cout << "Parsing succeeded\n";
        std::cout << "got: " << boost::fusion::as_vector(cb) << std::endl;
        std::cout << "\n-------------------------\n";
    }
    else
    {
        std::cout << boost::fusion::tuple_open('[');
        std::cout << boost::fusion::tuple_close(']');
        std::cout << boost::fusion::tuple_delimiter(", ");

        std::cout << "-------------------------\n";
        std::cout << "Parsing failed\n";
        std::cout << "got: " << boost::fusion::as_vector(cb) << std::endl;
        std::cout << "\n-------------------------\n";
    }

    return EXIT_SUCCESS;
}
James Matta
  • 1,562
  • 16
  • 37
  • Have you considered boost::program_options? – ravenspoint Nov 05 '15 at 22:10
  • 1
    I hadn't and I will have to take a look at it. boost::program_options might be simpler in the long run, but for now, I want to figure this out, both because it gave me the problem and I want to know the solution, and because learning spirit would be handy if I ever have handle something more intractable. Thank you for the suggestion though. – James Matta Nov 05 '15 at 22:17
  • @ravenspoint I fail to see how that would apply here. I can almost see how Boost Property Tree would be an option. But I think both imply changing the input format – sehe Nov 06 '15 at 15:50

2 Answers2

6

Just for fun/completeness I reviewed the grammar and came up with the following test.

I have made a few improvement suggestions left and right (as the OP witnessed on the live stream), and the resulting code, test and output are here:

Live On Coliru

#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <fstream>
#include <iostream>

namespace blocks {
    struct CalcBlock {
        std::string calculationTitle;
        float       matchingRad;
        float       stepSize;
        std::string problemType;
        int         maxPartialWaveJ;
        float       sMatrixConvergenceValue;
        float       partialWaveConvergenceValue;    
        float       smallValueLimit;
        std::string potentialRadType;
    };
}

BOOST_FUSION_ADAPT_STRUCT(blocks::CalcBlock, // Boost 1.58+ style adapt-struct
        calculationTitle, matchingRad, stepSize, problemType, maxPartialWaveJ,
        sMatrixConvergenceValue, partialWaveConvergenceValue, smallValueLimit,
        potentialRadType)

namespace blocks {

    namespace qi = boost::spirit::qi;

    template <typename Iterator>
    struct CalcBlockParser : qi::grammar<Iterator, CalcBlock()> {

        CalcBlockParser() : CalcBlockParser::base_type(start) {

            using namespace qi;
            auto eol_ = copy((',' >> *eol) | +eol); // http://stackoverflow.com/a/26411266/85371 (!)

            quotedString = '"' >> +~char_("\"\n") >> '"';
            plainString  =  +~char_(" ,\n");

            start        = skip(blank) [cbRule];

            cbRule       = lexeme["[CalculationBlock]"] >> eol 
              >> (
                      (lexeme["CalculationTitle"] >> '=' >> quotedString >> eol_)
                    ^ (lexeme["MatchingRadius"]   >> '=' >> float_       >> eol_)
                    ^ (lexeme["StepSize"]         >> '=' >> float_       >> eol_)
                    ^ (lexeme["ProblemType"]      >> '=' >> plainString  >> eol_)
                    ^ (lexeme["MaxPartialWaveJ"]  >> '=' >> int_         >> eol_)
                    ^ (lexeme["SMatConv"]         >> '=' >> float_       >> eol_)
                    ^ (lexeme["PartialWaveConv"]  >> '=' >> float_       >> eol_)
                    ^ (lexeme["SmallValueLimit"]  >> '=' >> float_       >> eol_)
                    ^ (lexeme["PotentialRadType"] >> '=' >> plainString  >> eol_)
                 )
             >> lexeme["[end]"]
             >> *eol 
             >> eoi;
        }

      private:
        qi::rule<Iterator, CalcBlock()> start;
        qi::rule<Iterator, CalcBlock(), qi::blank_type> cbRule;
        // lexemes:
        qi::rule<Iterator, std::string()> quotedString, plainString;
    };
}

using   boost::fusion::as_vector;
typedef boost::spirit::istream_iterator It;

int main(int argc, char **argv) {
    if (argc != 2) {
        std::cout << "Usage:\n\t" << argv[0] << " InputFileName" << std::endl;
        return 1;
    }

    std::string inputFileName(argv[1]);
    std::cout << "Reading input from the file: " << inputFileName << std::endl;
    std::ifstream input(inputFileName);
    input.unsetf(std::ios::skipws);

    It start(input), stop;

    blocks::CalcBlock cb;
    blocks::CalcBlockParser<It> cbParser;

    bool success = parse(start, stop, cbParser, cb);

    {
        using namespace boost::fusion;
        std::cout << tuple_open('[') << tuple_close(']') << tuple_delimiter(", ");
    }

    std::cout << "-------------------------\n";
    std::cout << "Parsing " << (success?"succeeded":"failed") << "\n";
    std::cout << "got: "    << as_vector(cb)                  << "\n";
    std::cout << "-------------------------\n";
}

Input:

[CalculationBlock]
CalculationTitle="Test Parser Input System"


SMatConv=10E-8,


PartialWaveConv= 10E-8, MaxPartialWaveJ=800, SmallValueLimit = 10E-8

PotentialRadType=HeavyIon , MatchingRadius=25.0, StepSize=0.01,ProblemType=RelSchroedingerEqn

[end]

Output:

Reading input from the file: input.txt
-------------------------
Parsing succeeded
got: [Test Parser Input System, 25, 0.01, RelSchroedingerEqn, 800, 1e-07, 1e-07, 1e-07, HeavyIon]
-------------------------
sehe
  • 374,641
  • 47
  • 450
  • 633
  • 1
    Well. Thank you (the other answer gets my upvote). Here's a bonus **[Spirit X3 parser](http://coliru.stacked-crooked.com/a/c84aae191247c937)**. In case you want the whole war story: [stream part 1](https://www.livecoding.tv/video/stateful-semantic-actions-in-spirit-x3-part1/) and [part 2](https://www.livecoding.tv/video/stateful-semantic-actions-in-spirit-x3-part2/) ([experiment](http://chat.stackoverflow.com/transcript/10?m=24182469#24182469)) – sehe Nov 06 '15 at 02:19
  • @JamesMatta Since you removed your question, here's it the X3 version extended with error reporting: **[Live On Coliru](http://coliru.stacked-crooked.com/a/c2db66e432ea9b72)**. Note the instances of `eps` introduced to [work around this bug](http://boost.2283326.n4.nabble.com/Single-element-attributes-in-X3-quot-still-quot-broken-td4681549.html). (See the struggle: [part #1](http://tinyurl.com/o2ne8nr), [part #2](http://tinyurl.com/omfvov6), [part #3](http://tinyurl.com/o449nbe)) – sehe Nov 13 '15 at 01:54
3

You must use permutation operator ^:

start %=
        lit("[CalculationBlock]") >> '\n' >>
        (
        (lit("CalculationTitle") >> '=' >> quotedString >> (lit(',') | lit)('\n')))
        ^ (lit("MatchingRadius") >> '=' >> float_ >> (lit(',') | lit('\n')))
        ^ (lit("StepSize") >> '=' >> float_ >> (lit(',') | lit('\n')))
        ^ (lit("ProblemType") >> '=' >> plainString >> (lit(',') | lit('\n')))
        ^ (lit("MaxPartialWaveJ") >> '=' >> int_ >> (lit(',') | lit('\n')))
        ^ (lit("SMatConv") >> '=' >> float_ >> (lit(',') | lit('\n')))
        ^ (lit("PartialWaveConv") >> '=' >> float_ >> (lit(',') | lit('\n')))
        ^ (lit("SmallValueLimit") >> '=' >> float_ >> (lit(',') | lit('\n')))
        ^ (lit("PotentialRadType") >> '=' >> plainString >> (lit(',') | lit('\n')))
        )
        >> lit("\n[end]\n");
Jepessen
  • 11,744
  • 14
  • 82
  • 149
  • I really wish I had spotted that operator before I posted the question. That said, would doing this confuse the ordering of information into the structure that I am accessing via fusion? – James Matta Nov 05 '15 at 22:19
  • A brief test later shows that it does interfere with the ordering of information in the structure. In fact parsing fails if I change the order, is there a way around that? – James Matta Nov 05 '15 at 22:22
  • @JamesMatta your brief test is wrong. Also, it's just documented: http://www.boost.org/doc/libs/1_59_0/libs/spirit/doc/html/spirit/qi/reference/operator/permutation.html#spirit.qi.reference.operator.permutation.attributes – sehe Nov 05 '15 at 22:24
  • @sehe Even with the documentation, spirit is hard for a neophyte, it is essentially defining a language within a language using operator overloading and template metaprogramming, the latter of which is nontrivial. Anyways, I found that documentation after Jepessen's answer appeared, it does not explain well the interaction between ^ and the fusion tuple system. I will have to test and learn more. – James Matta Nov 05 '15 at 22:30
  • @JamesMatta It doesn't explain the interaction, because there is none. Your expectation should be exactly the same as when they wrote exactly the same [here](http://www.boost.org/doc/libs/1_59_0/libs/spirit/doc/html/spirit/qi/reference/operator/sequence.html#spirit.qi.reference.operator.sequence.attributes). Do you realize the source of the (understandable) confusion? People are too smart, and they seek meaning where there is none. – sehe Nov 06 '15 at 01:17