3

We are using the following standard boost spirit code to convert a series of string containing list of floating point numbers to float arrays. The input data is quite huge(running into many GBs of text.) Therefore performance is critical. We use the code in multithreaded enviroment. Recently we noticed significant performance degradation in this API in two versions of this code. The compiler version, flags and the boost version is same across the two versions. The only relevant change is using STL vector as the output container instead of a simple implementation of array container. I am not sure if the degradation in run time is due to change in container or something else(as degradation is around 50 percent which can not be explained by use of STL vector.). We use Sun collector for profiling and it shows significant increase in CPU time in case of multithreaded testcase. What can be the reason other than the change in container type for degradation , given that boost verssion is same and compiler/compiler flags are also same. Any suggestions are welcome.

Thanks.

#define BOOST_SPIRIT_THREADSAFE
#include <boost/config/warning_disable.hpp>
#include "boost/spirit/include/qi.hpp"
#include "boost/spirit/include/phoenix_core.hpp"
#include "boost/spirit/include/phoenix_operator.hpp"
#include "boost/spirit/include/phoenix_stl.hpp"
#include <vector>

    bool parse_numbers(const char* first, const char* last, std::vector<float> &v)
    {
          using qi::phrase_parse;
          using qi::_1;
          using ascii::space;
          using phoenix::push_back;
          using qi::double_;

          bool r = phrase_parse(first, last,


              (
               double_[push_back(phoenix::ref(v), _1)]
               >> *((',' >> qi::double_[push_back(phoenix::ref(v), _1)]) |
                    (qi::double_[push_back(phoenix::ref(v), _1)]))
              )
              ,


              space);

    return r;
    }

    int main()
    {
       const char *input1 = "3.4 567, 89, 90 91";
          const char *input2 = "3.4, 567, 89, 90, 91";
       const char *input3 = "3.4 567 89, 90 91";
       std::vector<float myVec;
       parse_numbers(input1, input1 + strlen(input1), myVec);
       myVec.clear();
          parse_numbers(input2, input2 + strlen(input2), myVec);
       myVec.clear();
       parse_numbers(input3, input3 + strlen(input3), myVec);

    }
sehe
  • 374,641
  • 47
  • 450
  • 633
  • You need to show a full example. I see numerous things that can be trivially fixed. Also see some of my answers (search for "spirit fastest" or "spirit mapped" e.g.) if you want to find out on your own. – sehe Sep 05 '15 at 15:01
  • I want to implement the parser which can parse the following input string: The string can be a single floating number, or a list of either space separated or comma separated floating point numbers. The output should be pushed to a float array. What is the fastest mechanism to implement it using boost spirit(or if there is any other faster parser in C++.) ? – Rajat Gupta Sep 06 '15 at 07:30
  • I think you missed the point of the full example. Something that _demonstrates_ the performance degradation you talk about. See http://www.sscce.org/, http://meta.stackexchange.com/questions/22754/sscce-how-to-provide-examples-for-programming-questions/22762#22762, and http://stackoverflow.com/help/mcve – sehe Sep 06 '15 at 11:36
  • I have edited the code snippet to include the complete function. – Rajat Gupta Sep 06 '15 at 14:34
  • Thanks for that. I'm a bit confused now. I see no "test case" that could be increased in CPU time, and also there's no hint what the multi threading would be. What do you want to optimize? – sehe Sep 06 '15 at 16:19

1 Answers1

1

I don't think we're getting the full context, but let add some input based on experience here:

  1. Simpler is better

    #include <boost/spirit/include/qi.hpp>
    
    namespace qi = boost::spirit::qi;
    
    static const auto rule = qi::copy(qi::double_ % -qi::lit(','));
    
    template <typename It, typename Cont>
    bool parse_numbers(It first, It last, Cont& v) {
        return qi::phrase_parse(first, last, rule, qi::space, v);
    }
    

    Who needs phoenix? Also, avoid compiling the parser expression each time (although here, in likelihood everything was inlined away).

  2. The speed eater is IO, likely. You don't show it, but where you get the iterators from matters most.

Here's a sample program that contrasts the speed reading 6.4 million lines of 4 uniformly random floats [1] between just ifstream and using memory mapped files:

#include <fstream>
#include <boost/iostreams/device/mapped_file.hpp>

void test(std::string const fname) {
#if 0
    std::ifstream ifs(fname, std::ios::binary);
    boost::spirit::istream_iterator f(ifs >> std::noskipws), l;
#else
    boost::iostreams::mapped_file_source ifs(fname);
    char const *f(ifs.begin()), *l(ifs.end());
#endif

    std::vector<float> myVec;
    myVec.reserve(4* (6400ul << 10));

    if (parse_numbers(f, l, myVec))
        std::cout << "Parsed: " << myVec.size();
    else
        std::cout << "Parse failed";
}

int main(int argc, char** argv) {
    if (argc>1) test(argv[1]);
}

This prints

Parsed: 26214400

It takes ~2.8s using the memory map. Using ifstream[2] it takes 19.5s, ~6.9x longer.

See it Live On Coliru.

Notes:

  • it's easy to use mmap directly instead of using Boost Iostreams
  • I know your usage pattern is likely different, but I had no idea how to usefully test benchmark parsing single lines of 3 floats

[1] generated with

dd if=/dev/urandom bs=1M count=100 | od -f -Anone > 6.4m.txt

[2] do difference when using multi_pass_iterator<> on top of istreambuf_iterator<char>

Community
  • 1
  • 1
sehe
  • 374,641
  • 47
  • 450
  • 633
  • Thanks Sehe. One question. What is the meaning of the following rule ? static const auto rule = qi::copy(qi::double_ % -qi::lit(',')); Specfically what does -qi::lit(',') mean ? – Rajat Gupta Sep 07 '15 at 03:20
  • 1
    Erm... [the documentation](http://www.boost.org/doc/libs/1_59_0/libs/spirit/doc/html/spirit/qi/reference/operator.html) is your friend. So, it's a list of doubles, separated by optional `','`... Isn't that what you described? (Your [now-deleted comment](http://stackoverflow.com/questions/32414087/performance-degradation-in-boost-spirit/32427657?noredirect=1#comment52696050_32414331) said _"Regarding grammar, the input can be a single floating point number, a comma separted list of floats or simply space separated floats. The grammar should support all of these"_) – sehe Sep 07 '15 at 07:55
  • Thanks. This is really helpful. – Rajat Gupta Sep 07 '15 at 08:57
  • @RajatGupta Welcome to SO (please also read http://meta.stackoverflow.com/questions/5234/how-does-accepting-an-answer-work) – sehe Sep 07 '15 at 09:01