2

I may be asking a very trivial question but am not getting blocks out of my brain to crack it. Trying to parse a SQL like where clause as shown below using boost::spirit::qi to generate a vector of pairs

std::string input = "book.author_id = '1234' and book.isbn = 'xy99' and book.type = 'abc' and book.lang = 'Eng'"

I have gone through the following threads but still unable to do it :-( Thread5 Thread4Thread3 Thread2 Thread1

[Thread1][6]
[Thread2][7]
[Thread3][8]
[Thread4][9]
[Thread5][10]

I genuinely request, kindly help me understand how to achieve this ... may be I had not completely given my 100% but please be kind ....

Here is the full code (some part commented which I wish to do), as a first step I was just checking if I can get all tokens in a Vector and then parse each Vector element to generate another vector of std::pair

#include <boost/fusion/adapted.hpp>
#include <boost/spirit/include/qi.hpp>
#include <map>
#include <vector>

namespace qi    = boost::spirit::qi;
namespace phx   = boost::phoenix;

typedef std::string str_t;
typedef std::pair<str_t, str_t> pair_t;
typedef std::vector<pair_t> pairs_t;

typedef std::vector<str_t> strings_t;
//typedef std::map<std::string, std::string> map_t;
//typedef std::vector<map_t> maps_t;

template <typename It, typename Skipper = qi::space_type>
    //struct parser : qi::grammar<It, pairs_t(), Skipper>
    struct parser : qi::grammar<It, strings_t(), Skipper>
{
    parser() : parser::base_type(start)
    {
        using namespace qi;

        cond    = lexeme [ *(char_) ];
        conds   =  *(char_) >> cond % (lit("and"));

        //conds =  *(char_ - lit("and")) >>(cond % lit("and"));
        /*cond  = lexeme [ *(char_ - lit("and")) ];
        cond    = key >> "=" >> value;
        key     = *(char_ - "=");
        value   = ('\'' >> *(~char_('\'')) >> '\'');
        kv_pair = key >> value;*/
        start   = conds;
        //cond  = key >> "=" >> value;
        //key       = *(char_ - "=");
        //value = ('\'' >> *(~char_('\'')) >> '\'');
  //      kv_pair   = key >> value;
  //      start = kv_pair;
    }

  private:
    qi::rule<It, str_t(), Skipper> cond;
    qi::rule<It, strings_t(), Skipper> conds;
    //qi::rule<It, std::string(), Skipper> key, value;//, cond;
    //qi::rule<It, pair_t(), Skipper> kv_pair;
    //qi::rule<It, pairs_t(), Skipper> start;
    qi::rule<It, strings_t(), Skipper> start;
};

template <typename C, typename Skipper>
    bool doParse(const C& input, const Skipper& skipper)
{
    auto f(std::begin(input)), l(std::end(input));

    parser<decltype(f), Skipper> p;
    strings_t data;

    try
    {
        bool ok = qi::phrase_parse(f,l,p,skipper,data);
        if (ok)   
        {
            std::cout << "parse success\n";
            std::cout << "No Of Key-Value Pairs=  "<<data.size()<<"\n";
        }
        else    std::cerr << "parse failed: '" << std::string(f,l) << "'\n";
        return ok;
    } 
    catch(const qi::expectation_failure<decltype(f)>& e)
    {
        std::string frag(e.first, e.last);
        std::cerr << e.what() << "'" << frag << "'\n";
    }

    return false;
}

int main()
{
    std::cout<<"Pair Test \n";
    const std::string input = "book.author_id = '1234' and book.isbn = 'xy99' and book.type = 'abc' and book.lang = 'Eng'";
    bool ok = doParse(input, qi::space);
    std::cout<< input <<"\n";
    return ok? 0 : 255;
}

OUTPUT:

Pair Test
parse success
No Of Key-Value Pairs=  2
book.author_id = '1234' and book.isbn = 'xy99' and book.type = 'abc' and book.lang = 'Eng'

Which I expect 4 ... since there are 4 conditions !!

Thanks in Advance Regards, Vivek

some example to work out- live on coliru

Community
  • 1
  • 1
Vivek S
  • 47
  • 7
  • Perhaps also of interest: [Parsing SQL Queries in C++ using Boost.Spirit](http://stackoverflow.com/a/12631146/85371) and [Attempting to parse SQL like statement with Boost-Spirit](http://stackoverflow.com/a/10339716/85371). – sehe May 07 '14 at 20:10

1 Answers1

1

I'm sorry to break it to you, but your grammar is far more broken than you imagined.

    conds   =  *(char_) // ...

Right here, you're basically just parsing all the input into a single string, with whitespace skipped. In fact, adding

    for (auto& el : data)
        std::cout << "'" << el << "'\n";

after parsing prints:

Pair Test 
parse success
No Of Key-Value Pairs=  2
'book.author_id='1234'andbook.isbn='xy99'andbook.type='abc'andbook.lang='Eng''
''

As you can see, the first element is the string that *char_ parsed, and you get an empty element for free due to the fact that both conds and cond match on empty input.

I would strongly suggest you to start simple. And I mean, much simpler.

Slowly build your grammar up from the ground. Spirit is a very good tool to tackle with test-driven development (except for the compile times, but hey, you get more time to think!).

Here's something that I just made up, starting thinking from the very first building block, the indentifier, and working my way up to the higher-level elements:

// lexemes (no skipper)
ident     = +char_("a-zA-Z.");
op        = no_case [ lit("=") | "<>" | "LIKE" | "IS" ];
nulllit   = no_case [ "NULL" ];
and_      = no_case [ "AND" ];
stringlit = "'" >> *~char_("'") >> "'";

// other productions
field     = ident;
value     = stringlit | nulllit;
condition = field >> op >> value;

conjunction = condition % and_;
start       = conjunction;

These are close to the simplest thing that I suppose could parse your grammar (with a few creative notes left and right, where they don't seem too intrusive).

UPDATE So this is where I got in 20 minutes:

I always start out with mapping the types that I want the rules to expose:

namespace ast
{
    enum op { op_equal, op_inequal, op_like, op_is };

    struct null { };

    typedef boost::variant<null, std::string> value;

    struct condition
    {
        std::string _field;
        op _op;
        value _value;
    };

    typedef std::vector<condition> conditions;
}

Only condition cannot be "naturally" used in a Spirit grammar without adaptation:

BOOST_FUSION_ADAPT_STRUCT(ast::condition, (std::string,_field)(ast::op,_op)(ast::value,_value))

Now comes the grammar itself:

    // lexemes (no skipper)
    ident       = +char_("a-zA-Z._");
    op_token.add
        ("=",    ast::op_equal)
        ("<>",   ast::op_inequal)
        ("like", ast::op_like)
        ("is",   ast::op_is);
    op          = no_case [ op_token ];
    nulllit     = no_case [ "NULL" >> attr(ast::null()) ];
    and_        = no_case [ "AND" ];
    stringlit   = "'" >> *~char_("'") >> "'";

    //// other productions
    field       = ident;
    value       = stringlit | nulllit;
    condition   = field >> op >> value;

    whereclause = condition % and_;
    start       = whereclause;

You can see minor deviations from my original sketch, that's interesting:

  • added _ to identifier chars
  • move op_token into a symbol matcher (because that's easier to map the enum values)

See it all Live And Working On Coliru, output:

Pair Test 
parse success
No Of Key-Value Pairs=  4
( [book.author_id] = 1234 )
( [book.isbn] LIKE xy99 )
( [book.type] = abc )
( [book.lang] IS NULL )

book.author_id = '1234' and book.isbn liKE 'xy99' and book.type = 'abc' and book.lang IS null
sehe
  • 374,641
  • 47
  • 450
  • 633
  • Thanks Much Sehe for exploring the details ... I guess I had put a very very naive effort and your post is an eye opener .. will think more attentively and work it out to build a cleaner, simpler grammer ... Regards Vivek – Vivek S May 08 '14 at 03:24
  • pardon for disturbing again .. but the implementation cannot parse LIKE or IS if provided instead of '=' !! whereas if I break tokens as "and_" then use "|" in "field >> (op_like | op_is) >> value" then it works .. is there any glitch which is hidden .. Regards Vivek – Vivek S May 08 '14 at 06:39
  • Posted a request for comments **[at the mailing list](http://boost.2283326.n4.nabble.com/no-case-symbols-lt-gt-not-matching-Spirit-V-2-td4662098.html)** (minimal reproducer included) and updated the answer. – sehe May 08 '14 at 08:00
  • Oops. It was documented! We need to **[put the symbols entries in lowercase for use in `no_case`](http://www.boost.org/doc/libs/1_55_0/libs/spirit/doc/html/spirit/qi/reference/string/symbols.html#spirit.qi.reference.string.symbols.example)**. Glad that's resolved :) – sehe May 08 '14 at 08:35
  • thanks again, its all working and I really appreciate your diligent efforts – Vivek S May 08 '14 at 09:46
  • looks like am still learning but not enough :( actually am trying to replace stringlit = "'" >> *~char_("'") >> "'"; by stringlit = QT >> *~char_(QT) >> QT; by defining boost::spirit::classic::chlit<> QT('\''); but it is not compiling any insight would be helpful http://coliru.stacked-crooked.com/ Regards Vivek – Vivek S May 13 '14 at 12:47
  • @VivekS See the **[`quoted_string` rule here](http://stackoverflow.com/questions/10289985/parse-quoted-strings-with-boostspirit/10294577#10294577)** – sehe May 13 '14 at 14:23
  • Hi Sehe ... stuck again for some special case ... Actually I have built a small select statement parser and want to do some conditional processing I tried qi::eps but cant do it right .. as shown in [link](http://coliru.stacked-crooked.com/a/2337124e6af0c12f) kindly suggest a better way or enlighten me how to use qi::eps – Vivek S Jun 19 '14 at 06:56
  • Thanks much for the help though, I can understand and believe me my intentions were never to use your knowledge for my own benefits, donno but was impressed the way you keep answering spirit queries ... will try harder to learn .. Regards, Vivek – Vivek S Jun 19 '14 at 09:36
  • @VivekS hint: I usually think Spirit questions are interesting :) – sehe Jun 19 '14 at 09:39
  • @sehe, I am also trying to learn Spirit by working with your example. I am trying to add OR and parenthesis support to your example, but I simply can't figure out how. – Sharath Apr 04 '17 at 16:04
  • @Sharath If you make the question a bit more specific (like, show us what you tried) you can post it as its own question – sehe Apr 04 '17 at 16:15
  • @sehe, I started with your example at Coliru, tried to add OR (or_), just like you added support for AND (and_). But I was stumped at line 71: whereclause = condition % and_; The template syntax here goes beyond my understanding. How do I add both and_ and or_ to this whereclause? – Sharath Apr 04 '17 at 16:27
  • @Sharath see http://www.boost.org/doc/libs/1_63_0/libs/spirit/doc/html/spirit/qi/reference/operator/list.html – sehe Apr 04 '17 at 16:27
  • Ok, this did the trick. Is that correct? whereclause = condition % (and_ | or_); – Sharath Apr 04 '17 at 16:36
  • @Sharath Does it do what you want? – sehe Apr 04 '17 at 16:38
  • Yes, it is able to recognize OR as a condition now. – Sharath Apr 04 '17 at 16:43
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/139884/discussion-between-sehe-and-sharath). – sehe Apr 04 '17 at 16:43