1

I wrote a x3 parser to parse a structured text file, here is the demo code:

int main() {
        char buf[10240];
        type_t example;   // def see below
        FILE* fp = fopen("text", "r");
        while (fgets(buf, 10240, fp))  // read to the buffer
        {
            int n = strlen(buf);
            example.clear();
            if (client::parse_numbers(buf, buf+n, example))  // def see below
            { // do nothing here, only parse the buf and fill into the example }
        }
    }

    struct type_t {
        int id;
        std::vector<int> fads;
        std::vector<int> fbds;
        std::vector<float> fvalues;
        float target;

        void clear() {
            fads.clear();
            fbds.clear();
            fvalues.clear();
        }


    };

    template <typename Iterator>
    bool parse_numbers(Iterator first, Iterator last, type_t& example)
    {
        using x3::int_;
        using x3::double_;
        using x3::phrase_parse;
        using x3::parse;
        using x3::_attr;
        using ascii::space;

        auto fn_id = [&](auto& ctx) { example.id = _attr(ctx); };
        auto fn_fad = [&](auto& ctx) { example.fads.push_back(_attr(ctx)); };
        auto fn_fbd = [&](auto& ctx) { example.fbds.push_back(_attr(ctx)); };
        auto fn_value = [&](auto& ctx) { example.fvalues.push_back(_attr(ctx)); };
        auto fn_target = [&](auto& ctx) { example.target = _attr(ctx); };

        bool r = phrase_parse(first, last,

            //  Begin grammar
            (
                int_[fn_id] >>
                double_[fn_target] >>
                +(int_[fn_fad] >> ':' >> int_[fn_fbd] >> ':' >> double_[fn_value])
            )
            ,
            //  End grammar

            space);

        if (first != last) // fail if we did not get a full match
            return false;
        return r;
    }
    //]
}

Am I doing it the right way or how to improve? I'd like to see if any optimization could be done before I switch back to my strsep parsing implementation, since it's much faster than this x3 version.

avocado
  • 2,615
  • 3
  • 24
  • 43
  • For such a simple parse, Spirit is a bit overkill. For something more complicated, like (say) the C Preprocessor syntax, Spirit is great. – Eljay May 26 '18 at 15:14
  • You've got one buffer in automatic storage vs. multiple in dynamic storage – not exactly a fair comparison. A memory pool/arena allocator would change things a _lot_ here. – ildjarn May 26 '18 at 18:30
  • @ildjarn, I don't quite understand what you mean, would you please give me an example? – avocado May 27 '18 at 01:51
  • I could, if you post your `strsep` code for reference, so I have a basis of comparison before posting an answer. :-] – ildjarn May 27 '18 at 11:19

1 Answers1

0

Why do you use semantic actions for this? An interesting point to read about is sehe's article Boost Spirit: “Semantic actions are evil”? and other notes about. Parsing into an AST structure as shown by the X3 examples, e.g. Employee - Parsing into structs is IMO much more natural. You need the visitor pattern to evaluate the data later on.

One solution is shown here:

#include <iostream>
#include <sstream>
#include <fstream>
#include <vector>
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/home/x3.hpp>

namespace ast {
    struct triple {
        double fad;
        double fbd;
        double value;
    };

    struct data {
        int id;
        double target;
        std::vector<ast::triple> triple;
    };
}

BOOST_FUSION_ADAPT_STRUCT(ast::triple, fad, fbd, value)
BOOST_FUSION_ADAPT_STRUCT(ast::data,   id, target, triple)

namespace x3 = boost::spirit::x3;

namespace parser {

    using x3::int_; using x3::double_;

    auto const triple = x3::rule<struct _, ast::triple>{ "triple" } =
        int_ >> ':' >> int_ >> ':' >> double_;
    auto const data = x3::rule<struct _, ast::data>{ "data" } =
        int_ >> double_ >> +triple;
}

int main()
{
    std::stringstream buffer;
    std::ifstream file{ R"(C:\data.txt)" };

    if(file.is_open()) {
        buffer << file.rdbuf();
        file.close();
    }

    auto iter = std::begin(buffer.str());
    auto const end = std::cend(buffer.str());
    ast::data data;

    bool parse_ok = x3::phrase_parse(iter, end, parser::data, x3::space, data);

    if(parse_ok && (iter == end)) return true;
    return false;
}

It does compile (see Wandbox), but isn't tested due to missing input data (which you can generate by you own inside the main() of course), but you are interested in benchmarking only.

Also note the use of stringstream to read the rdbuf. The are several ways to skin the cat, I refer here to How to read in a file in C++ where the rdbufreading approach is fast.

Further, how did you benchmark? Simply measure the time required by x3::phrase_parse() resp. strsep part only or the hole binary? file loading time inclusive? It must be compareable! Also consider OS filesystem caching etc.

BTW, it would be interesting to see the results and the test environment (data file size, strsep implementation etc).

Addendum:

If you approximately know how much data you can expect, you can pre-allocate memory for the vector using data.triple.reserve(10240); (or write an own constructor with this as arg). This prevents re-allocating during parsing (don't forget to enclose this into try/catch block to capture std::bad_alloc etc.). IIR the default capacity is 1000 on older gcc.

Olx
  • 163
  • 8