1

I'm new to Spirit and to Boost in general. I'm trying to parse a section of a VRML file that looks like this:

point
            [
            #coordinates written in meters.
            -3.425386e-001 -1.681608e-001 0.000000e+000,
            -3.425386e-001 -1.642545e-001 0.000000e+000,
            -3.425386e-001 -1.603483e-001 0.000000e+000,

The comment that starts with # is optional.

I've written a grammar, that works fine, but the parsing process is taking to long. I would like to optimize it to run much faster. My code looks like this:

struct Point
{
    double a;
    double b;
    double c;

    Point() : a(0.0), b(0.0), c(0.0){}
};

BOOST_FUSION_ADAPT_STRUCT
(
    Point,
    (double, a) 
    (double, b)
    (double, c)
)

namespace qi   = boost::spirit::qi;
namespace repo = boost::spirit::repository;

template <typename Iterator>
struct PointParser : 
public qi::grammar<Iterator, std::vector<Point>(), qi::space_type>
{
   PointParser() : PointParser::base_type(start, "PointGrammar")
   {
      singlePoint = qi::double_>>qi::double_>>qi::double_>>*qi::lit(",");
      comment = qi::lit("#")>>*(qi::char_("a-zA-Z.") - qi::eol);
      prefix = repo::seek[qi::lexeme[qi::skip[qi::lit("point")>>qi::lit("[")>>*comment]]];
      start %= prefix>>qi::repeat[singlePoint];     

      //BOOST_SPIRIT_DEBUG_NODES((prefix)(comment)(singlePoint)(start));
   }

   qi::rule<Iterator, Point(), qi::space_type>              singlePoint;
   qi::rule<Iterator, qi::space_type>                       comment;
   qi::rule<Iterator, qi::space_type>                       prefix;
   qi::rule<Iterator, std::vector<Point>(), qi::space_type> start;
};

The section that I intend to parse, is located at the middle of the input text, so I need to skip the portion of the text in order to get to it. I implemented it using repo::seek. Is this the best way?

I run the parser in the following way:

std::vector<Point> points;
typedef PointParser<std::string::const_iterator> pointParser;   
pointParser g2; 

auto start = ch::high_resolution_clock::now();  
bool r = phrase_parse(Data.begin(), Data.end(), g2, qi::space, points); 
auto end = ch::high_resolution_clock::now();

auto duration = ch::duration_cast<boost::chrono::milliseconds>(end - start).count();

To parse about 80k entries in the input text, takes about 2.5 seconds, which is pretty slow for my needs. My question is there a way to write a parsing rules in more optimized way to make it (much) faster? How can I improve this implementation in general?

I'm new to Spirit, so some explanation will be much appreciated.

Slava C
  • 51
  • 4
  • If this same code gets between 36 ms and 2.5 s runtime on similar systems (ballpark) and similar input, my bet is on the latter not having optimizations enabled. – rubenvb Oct 06 '15 at 12:50
  • Not sure about this. I'm using precompiled version of boost 1.59 libs with VS2013. Can you suggest any optimization flags? – Slava C Oct 06 '15 at 13:36
  • Boost(.Spirit) is very template-heavy, the precompiled libraries aren't going to slow you down... Why do you think *your* code compiles so slowly? Spirit is compiled when you use it, not up front. You need to make sure you are timing release builds of your code. – rubenvb Oct 06 '15 at 13:49
  • Wow! What a difference. In **debug** in took 2462 msec, but in **release** it took 17 msec! Many thanks. – Slava C Oct 06 '15 at 14:07

1 Answers1

2

I've hooked your grammar into a Nonius benchmark and generated uniformly random input data of ~85k lines (download: http://stackoverflow-sehe.s3.amazonaws.com/input.txt, 7.4 MB).

  • are you measuring time in a release build?
  • are you using slow file input?

When reading the file up-front I consistently get a time of ~36ms to parse the whole bunch.

clock resolution: mean is 17.616 ns (40960002 iterations)

benchmarking sample
collecting 100 samples, 1 iterations each, in estimated 3.82932 s
mean: 36.0971 ms, lb 35.9127 ms, ub 36.4456 ms, ci 0.95
std dev: 1252.71 μs, lb 762.716 μs, ub 2.003 ms, ci 0.95
found 6 outliers among 100 samples (6%)
variance is moderately inflated by outliers

Code: see below.


Notes:

  • you seem conflicted on the use of skippers and seek together. I'd suggest you simplify prefix:

    comment     = '#' >> *(qi::char_ - qi::eol);
    
    prefix      = repo::seek[
                      qi::lit("point") >> '[' >> *comment
                  ];
    

    prefix will use the space skipper, and ignore any matched attributes (because of the rule declared type). Make comment implicitly a lexeme by dropping the skipper from the rule declaration:

        // implicit lexeme:
        qi::rule<Iterator>                       comment;
    

    Note See Boost spirit skipper issues for more background information.

Live On Coliru

#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/repository/include/qi_seek.hpp>

namespace qi   = boost::spirit::qi;
namespace repo = boost::spirit::repository;

struct Point { double a = 0, b = 0, c = 0; };

BOOST_FUSION_ADAPT_STRUCT(Point, a, b, c)

template <typename Iterator>
struct PointParser : public qi::grammar<Iterator, std::vector<Point>(), qi::space_type>
{
    PointParser() : PointParser::base_type(start, "PointGrammar")
    {
        singlePoint = qi::double_ >> qi::double_ >> qi::double_ >> *qi::lit(',');

        comment     = '#' >> *(qi::char_ - qi::eol);

        prefix      = repo::seek[
            qi::lit("point") >> '[' >> *comment
            ];

        //prefix      = repo::seek[qi::lexeme[qi::skip[qi::lit("point")>>qi::lit("[")>>*comment]]];

        start      %= prefix >> *singlePoint;

        //BOOST_SPIRIT_DEBUG_NODES((prefix)(comment)(singlePoint)(start));
    }

  private:
    qi::rule<Iterator, Point(), qi::space_type>              singlePoint;
    qi::rule<Iterator, std::vector<Point>(), qi::space_type> start;
    qi::rule<Iterator, qi::space_type>                       prefix;
    // implicit lexeme:
    qi::rule<Iterator>  comment;
};

#include <nonius/benchmark.h++>
#include <nonius/main.h++>
#include <boost/iostreams/device/mapped_file.hpp>

static boost::iostreams::mapped_file_source src("input.txt");

NONIUS_BENCHMARK("sample", [](nonius::chronometer cm) {
    std::vector<Point> points;

    using It = char const*;
    PointParser<It> g2;

    cm.measure([&](int) {
        It f = src.begin(), l = src.end();
        return phrase_parse(f, l, g2, qi::space, points);
        bool ok =  phrase_parse(f, l, g2, qi::space, points);
        if (ok)
            std::cout << "Parsed " << points.size() << " points\n";
        else
            std::cout << "Parsed failed\n";

        if (f!=l)
            std::cout << "Remaining unparsed input: '" << std::string(f,std::min(f+30, l)) << "'\n";

        assert(ok);
    });
})

Graph:

enter image description here

Another run output, live:

Community
  • 1
  • 1
sehe
  • 374,641
  • 47
  • 450
  • 633
  • I've put up the live stream recording (minus first 13 minutes...) here: https://www.livecoding.tv/video/another-spirit-grammar-benchmark-nonius/ ([experiment](http://chat.stackoverflow.com/transcript/10?m=24182469#24182469)) – sehe Oct 06 '15 at 12:59
  • Many thanks for your reply. I measuring time in **debug** build. I'm not sure what you mean by "reading the file up-front". I'm reading the file using **boost::iostreams::mapped_file_source** and passing the data to parser through **std::string**. I've stripped all the rules except the **singlePoint** and I still get ~2.5 sec . – Slava C Oct 06 '15 at 13:43
  • I've tried to remove the skipper from the _comment_ rule and to simplify the rules as you suggested, but parser didn't work. I'm doing anything incorrectly here? – Slava C Oct 06 '15 at 14:18
  • Please look at my sample. You can see it works in the live stream. I posted full sample and input for a reason. – sehe Oct 06 '15 at 14:23
  • The live stream didn't work at my end for some reason. Could be some network restrictions. Either way, I used _qi::char_("a-zA-Z.")_ in the _comment_ rule, because I have comma (".") character in it. With just _qi::char__ it didn't work. – Slava C Oct 06 '15 at 14:31
  • That makes really little sense. `char_` simply matches all characters. Full stop. Also you don't need the listen. I included _everything_ in the answer. You can even download the exact test input used. I suggest to start with just that. – sehe Oct 06 '15 at 14:38
  • You're right. it works correctly. I must have been missing something. I see that you're using: `using It = char const*; PointParser g2; It f = src.begin(), l = src.end();` Is this preferable over passing iterator (or the same)? Can I reduce the parsing time in _debug_ mode to be similar to a _release_ build? – Slava C Oct 06 '15 at 15:11
  • Passing `const char*` is not "preferrable" over passing an iterator. It **is** passing an iterator. Just more directly, it's a waste of memory and time to copy it to a `string` first. And yes you can simply debug the release build (enable optimizations and debug information). All compilers support this. It can be slightly confusing when the debugger "steps over" certain lines that have been optimized out or re-ordered, but it works fine. – sehe Oct 06 '15 at 16:36
  • Thanks a lot. Sorry for the dumb questions. I understand that I can compile _release_ with debug information, but is there a way to reduce the parsing time in full debug? I'm not sure that I understand the significant parsing time difference between _debug_ and _release_. Again, sorry for the questions that may seems trivial to you. I'm new at this. – Slava C Oct 06 '15 at 18:06
  • If you debug release then you debug release, so there's no difference. Did you define `NDEBUG`? For MSVC disable debug iterators: [_ITERATOR_DEBUG_LEVEL](https://msdn.microsoft.com/en-us/library/hh697468.aspx). – sehe Oct 06 '15 at 20:55
  • (Late note: `repeat[a]` is identical to the simpler `*a`.) – sehe Oct 06 '15 at 20:56