12

I'd need to parse partial SQL queries (it's for a SQL injection auditing tool). For example

'1' AND 1=1--

Should break down into tokens like

[0] => [SQL_STRING, '1']
[1] => [SQL_AND]
[2] => [SQL_INT, 1]
[3] => [SQL_AND]
[4] => [SQL_INT, 1]
[5] => [SQL_COMMENT]
[6] => [SQL_QUERY_END]

Are their any at least lexers for SQL that I base mine off of or any good tools like bison for C# (though I'd rather not write my own grammar as I need to support most if not all the grammar of MySQL 5)

Christopher Tarquini
  • 11,176
  • 16
  • 55
  • 73
  • 3
    See [Parsing SQL code in C#](http://stackoverflow.com/questions/589096/parsing-sql-code-in-c) and [Parsing SQL in .NET](http://stackoverflow.com/questions/76083/parsing-sql-in-net). – Matthew Flaschen May 30 '10 at 18:06
  • 3
    You don't want to write a SQL parser for a real SQL dialect such as MySQL by yourself. Get an implementation from a vendor that supplies it. – Ira Baxter May 30 '10 at 18:19
  • 6
    Like described in this answer: http://stackoverflow.com/questions/76083/parsing-sql-in-net/76151#76151, you could just use ANTLR, a OO parser generator, and use the MySQL dialect file from here: http://www.antlr.org/grammar/list. Ready to use. – Philip Daubmeier May 30 '10 at 18:24

3 Answers3

2

Seems that there's a few good parsers out there.

This SO article has a sample using MS's Entity Framework:
Parsing SQL code in C#

Seems someone else rolled their own and put it up on Code Project:
http://www.codeproject.com/KB/dotnet/SQL_parser.aspx

Personally, I'd go with the Entity Framework solution, since it was created and maintained by MS, but it also therefore probably is closely coupled with SQL Server. Since you're looking at MySQL, you may want to go with the custom solution on Code Project, as I'm sure you can then code in more custom solutions as the grammar requires.

I'll be using this soon (for Oracle, not MySQL), so please let the community know how the solution works out!

UPDATE:
I just came back to this and read the comments... upon further reflection, I'd really recommend ANTLR, since it supports multiple grammars. Once again, I haven't used it, so it'll be good to hear how it worked out, and it's up to you to decide.
https://stackoverflow.com/questions/76083/parsing-sql-in-net/76151

Community
  • 1
  • 1
Matt DeKrey
  • 11,582
  • 5
  • 54
  • 69
1

Also there may be some way to utilized fully parsed (by Microsoft) T-SQL via database editions of Visual studio -

The crown jewels of the Database Edition product are the SQL parsers and script generator, these two pieces form the foundation of what the database project system does internally.

http://blogs.msdn.com/b/gertd/archive/2008/08/21/getting-to-the-crown-jewels.aspx

Maslow
  • 18,464
  • 20
  • 106
  • 193
0

Similar information can be obtained using SqlParser 'Carbunql'.

https://github.com/mk3008/Carbunql

However, comment text is removed. Also, the type information of the token cannot be obtained.

Sample

using Carbunql;
using Carbunql.Analysis.Parser;

var w = WhereClauseParser.Parse("'1' AND 1=1--");

foreach (var item in w.GetTokens())
{
    Console.WriteLine(item.Text);
}

Results

where
'1'
AND
1
=
1