61

I want to create a SQL interface on top of a non-relational data store. Non-relational data store, but it makes sense to access the data in a relational manner.

I am looking into using ANTLR to produce an AST that represents the SQL as a relational algebra expression. Then return data by evaluating/walking the tree.

I have never implemented a parser before, and I would therefore like some advice on how to best implement a SQL parser and evaluator.

  • Does the approach described above sound about right?
  • Are there other tools/libraries I should look into? Like PLY or Pyparsing.
  • Pointers to articles, books or source code that will help me is appreciated.

Update:

I implemented a simple SQL parser using pyparsing. Combined with Python code that implement the relational operations against my data store, this was fairly simple.

As I said in one of the comments, the point of the exercise was to make the data available to reporting engines. To do this, I probably will need to implement an ODBC driver. This is probably a lot of work.

mike rodent
  • 14,126
  • 11
  • 103
  • 157
codeape
  • 97,830
  • 24
  • 159
  • 188
  • 2
    Why impose SQL limitations on objects? What's to be gained? What's wrong with OQL? http://en.wikipedia.org/wiki/Object_Query_Language – S.Lott Sep 08 '09 at 20:00
  • 9
    To be gained: A query interface that a huge number of reporting tools can use. I plan to implement an ODBC driver on the client. So that business users can use Crystal Reports, Excel etc. to fetch data from the data store. OQL, while probably a nice query language (I've never used it), is not as wide-spread as SQL. – codeape Sep 09 '09 at 11:29
  • 2
    +1 both: one of the biggest issues with OO databases is exactly the lack of reporting engines :( – van Sep 19 '09 at 11:43

6 Answers6

46

I have looked into this issue quite extensively. Python-sqlparse is a non validating parser which is not really what you need. The examples in antlr need a lot of work to convert to a nice ast in python. The sql standard grammars are here, but it would be a full time job to convert them yourself and it is likely that you would only need a subset of them i.e no joins. You could try looking at the gadfly (a Python SQL database) as well, but I avoided it as they used their own parsing tool.

For my case, I only essentially needed a where clause. I tried booleneo (a boolean expression parser) written with pyparsing but ended up using pyparsing from scratch. The first link in the reddit post of Mark Rushakoff gives a SQL example using it. Whoosh a full text search engine also uses it but I have not looked at the source to see how.

Pyparsing is very easy to use and you can very easily customize it to not be exactly the same as SQL (most of the syntax you will not need). I did not like ply as it uses some magic using naming conventions.

In short give pyparsing a try, it will most likely be powerful enough to do what you need and the simple integration with python (with easy callbacks and error handling) will make the experience pretty painless.

Reinderien
  • 11,755
  • 5
  • 49
  • 77
David Raznick
  • 17,907
  • 2
  • 35
  • 27
  • 1
    Thanks for sharing your experiences. From initial, very limited testing of python-sqlparse, it seems that I might be able to use it. I will try to work with the returned value from the ``parse`` function in python-sqlparse. But I will look into pyparsing in any case. – codeape Sep 09 '09 at 11:25
  • 1
    Pyparsing is a good tool for this, with lots of examples of parsing sql around. – Gregg Lind Sep 09 '09 at 21:30
  • 2
    This poster on the pyparsing wiki (http://pyparsing.wikispaces.com/message/view/home/14105203) just reported completing a SQL SELECT parser - perhaps you could contact him/her for help, suggestions, or even the code. – PaulMcG Sep 11 '09 at 02:51
  • TFTT. I have contacted the poster. – codeape Sep 11 '09 at 11:19
  • 3
    I implemented it using pyparsing. Pyparsing worked great for this. – codeape Sep 17 '09 at 12:07
13

This reddit post suggests python-sqlparse as an existing implementation, among a couple other links.

Dylan Hogg
  • 3,118
  • 29
  • 26
Mark Rushakoff
  • 249,864
  • 45
  • 407
  • 398
  • Thank you for the suggestion. Python-sqlparse looks interesting, I will give it a try. – codeape Sep 09 '09 at 11:19
  • 1
    It's been a while since this was posted - there is now https://github.com/macbre/sql-metadata which is built on the mentioned `sqlparse` library. `sql-metadata` doesn't give you an AST, but will extract resolved column and table names from a query. – Dylan Hogg Oct 22 '21 at 02:21
  • `sql-metadata` is amazing, i was looking for a ddl parser, and even though this doensn't handle ddls, its a great example of how to work with the tokenizer – Erik Aronesty Oct 27 '22 at 16:11
5

TwoLaid's Python SQL Parser works very well for my purposes. It's written in C and needs to be compiled. It is robust. It parses out individual elements of each clause.

https://github.com/TwoLaid/python-sqlparser

I'm using it to parse out queries column names to use in report headers. Here is an example.

import sqlparser

def get_query_columns(sql):
   '''Return a list of column headers from given sqls select clause'''

   columns = []

   parser = sqlparser.Parser()

   # Parser does not like new lines
   sql2 = sql.replace('\n', ' ')

   # Check for syntax errors
   if parser.check_syntax(sql2) != 0:
      raise Exception('get_query_columns: SQL invalid.')

   stmt = parser.get_statement(0)
   root = stmt.get_root()
   qcolumns = root.__dict__['resultColumnList']
   for qcolumn in qcolumns.list:
      if qcolumn.aliasClause:
         alias = qcolumn.aliasClause.get_text()
         columns.append(alias)
      else:
         name = qcolumn.get_text()
         name = name.split('.')[-1] # remove table alias
         columns.append(name)

   return columns

sql = '''
SELECT 
   a.a,
   replace(coalesce(a.b, 'x'), 'x', 'y') as jim,
   a.bla as sally  -- some comment
FROM
   table_a as a
WHERE
   c > 20
'''

print get_query_columns(sql)

# output: ['a', 'jim', 'sally']
dlink
  • 1,489
  • 17
  • 22
1

Of course, it may be best to leverage python-sqlparse on Google Code

UPDATE: Now I see that this has been suggested - I concur that this is worthwhile:

Mathieu Longtin
  • 15,922
  • 6
  • 30
  • 40
Barton
  • 547
  • 5
  • 11
1

I am using python-sqlparse with great success.

In my case I am working with queries that are already validated, my AST-walking code can make some sane assumptions about the structure.

Overbryd
  • 4,612
  • 2
  • 33
  • 33
1

SQLGlot is a no-dependency SQL parser, transpiler, optimizer, and engine. Syntax errors are highlighted. However, it should be noted that SQL validation is not SQLGlot’s goal, so some syntax errors may go unnoticed. (reddit) (benchmarks)

sqltree is an experimental parser for SQL, providing a syntax tree for SQL queries.

sample-sql-translator is a hand-written recursive-descent parser for SQL.

sqloxide wraps rust bindings for sqlparser-rs into a python package.

milahu
  • 2,447
  • 1
  • 18
  • 25