4

I'm looking for a C/C++ SQL parsing library which is able to provide me with the names of the tables the query depends on.

What I expect:

    SELECT * FROM TABLEA NATURAL JOIN TABLEB

Result: TABLEA, TABLEB

Certainly provided example is extremly simple. I've already written my own parser (based on Boost.Spirit) which handles a subset of SQL grammar, but what I need is a parser which is able to handle complicated (recursive etc.) queries.

Do you know anything useful for this purpose?

What I found is http://www.sqlparser.com - it's commercial but does exactly what I need. I also digged into the PostgreSQL sources, no effect.

bartull
  • 95
  • 1
  • 5
  • sorry for being offtopic, but... why? – vines Feb 21 '13 at 13:17
  • you could use a combination of a lexxer and yaccer to do this, I know you can find full grammar definitions on the web, but they are not all up to date... Anyway I've also searched a lot for something that can do this and ended writing my own parser that actually can skip undefined parts of grammar, but it remains hard to make it work with complex queries... – ppetrov Feb 21 '13 at 13:20
  • @vines - I'm writting a middleware library which transparently caches the results of the queries in Memcached, Redis or in local RAM (depending on config). For now it's just an academic research whether it makes sense or not. I'm testing it with SOCI library (http://soci.sourceforge.net/, I've implemented my own backend). Maybe I'll implement some JDBCv2 driver as well. – bartull Feb 21 '13 at 13:29
  • @ppetrov - Thanks for response, unfortunately I know that writing fully functional parser requires quite much time and effort. That's why I'm looking for something available for now. – bartull Feb 21 '13 at 13:41

1 Answers1

3

Antlr can produce a nice SQL parser (the source of the parser can be C++) for you, and there is few SQL grammars for it available: http://www.antlr3.org/grammar/list.html

If all you are interested in are the table names, then taking one of those grammars and adding a semantic action collecting those names should be fairly easy.

Having some experience with Antlr and Bison/Yacc & Lex/Flex I definitely recommend Antlr. It is written in Java, but the target language can be C++ - the generated code is actually readable, looks like written by a human. The debugging of Antlr generated parsers is quite OK, which cannot be said about those generated by Bison..

There are other options, like for example Lemon and sqlite grammar, have a look at this question if you like: SQL parser in C

Community
  • 1
  • 1
piokuc
  • 25,594
  • 11
  • 72
  • 102