79

I want to parse SQL code using C#.

Specifically, is there any freely available parser which can parse SQL code and generate a tree or any other structure out of it? It should also generate the proper tree for nested structures.

It should also return which kind of statement the node of this tree represents.

For example, if the node contains a loop condition then it should return that this is a "loop type" of a node.

Or is there any way by which I can parse the code in C# and generate a tree of the type I want?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Archie
  • 2,564
  • 8
  • 39
  • 50

8 Answers8

37

Specifically for Transact-SQL (Microsoft SQL Server) you can use the Microsoft.SqlServer.Management.SqlParser.Parser namespace available in Microsoft.SqlServer.Management.SqlParser.dll, an assembly included with SQL Server and which can be freely distributed.

Here's an example method for parsing T-SQL as a string into a sequence of tokens:

IEnumerable<TokenInfo> ParseSql(string sql)
{
    ParseOptions parseOptions = new ParseOptions();
    Scanner scanner = new Scanner(parseOptions);

    int state = 0,
        start,
        end,
        lastTokenEnd = -1,
        token;

    bool isPairMatch, isExecAutoParamHelp;

    List<TokenInfo> tokens = new List<TokenInfo>();

    scanner.SetSource(sql, 0);

    while ((token = scanner.GetNext(ref state, out start, out end, out isPairMatch, out isExecAutoParamHelp)) != (int)Tokens.EOF)
    {
        TokenInfo tokenInfo =
            new TokenInfo()
            {
                Start = start,
                End = end,
                IsPairMatch = isPairMatch,
                IsExecAutoParamHelp = isExecAutoParamHelp,
                Sql = sql.Substring(start, end - start + 1),
                Token = (Tokens)token,
            };

        tokens.Add(tokenInfo);

        lastTokenEnd = end;
    }

    return tokens;
}

Note that the TokenInfo class is just a simple class with the above-referenced properties.

Tokens is this enumeration:

and includes constants like TOKEN_BEGIN, TOKEN_COMMIT, TOKEN_EXISTS, etc.

UPDATE: it is now a separate nuget: https://www.nuget.org/packages/Microsoft.SqlServer.Management.SqlParser

Andrey Belykh
  • 2,578
  • 4
  • 32
  • 46
12

[Warning: answer may no longer apply as of 2021]

Use Microsoft Entity Framework (EF).

It has a "Entity SQL" parser which builds an expression tree,

using System.Data.EntityClient;
...
EntityConnection conn = new EntityConnection(myContext.Connection.ConnectionString);
conn.Open();
EntityCommand cmd = conn.CreateCommand();
cmd.CommandText = @"Select t.MyValue From MyEntities.MyTable As t";
var queryExpression = cmd.Expression;
....
conn.Close();

Or something like that, check it out on MSDN.

And it's all on Ballmers tick :-)

There is also one on The Code Project, SQL Parser.

Good luck.

DarkWanderer
  • 8,739
  • 1
  • 25
  • 56
TFD
  • 23,890
  • 2
  • 34
  • 51
  • 28
    That takes "Entity-SQL", a dialect of SQL; I believe the OP means regular SQL, such as "Transact-SQL" (SQL Server's dialect). In short; this won't work. – Marc Gravell Feb 26 '09 at 04:57
  • As far as I can tell there are no clues in orgional question either way? So before we right this off shall we wait for @aaCog to confirm? – TFD Feb 26 '09 at 05:05
  • Well, I meant SQL and not T-SQL. Also, i do not want to submit any commands to the db server but just want to do the processing over the SQL code which you consider as a simple text. – Archie Feb 26 '09 at 09:01
  • 25
    I can't find an Expression property for EntityCommand on msdn. http://msdn.microsoft.com/en-us/library/system.data.entityclient.entitycommand.aspx – burnt1ce Oct 05 '10 at 14:30
  • 14
    For the -ve 'ers. This answer was valid in 2009. EF has moved on, and this no longer applies – TFD Apr 18 '15 at 06:17
  • 6
    @TFD - The fact that this answer is no longer correct is however a valid reason for downvoting. As you know, this is how Stackoverflow is meant to work. It relies on people either updating the answer, or accepting that it will be voted down in preference for more correct answers to be voted above it. – PaulG Feb 11 '19 at 09:36
12

Scott Hanselman recently featured the Irony project which includes a sample SQL parser.

mdip
  • 600
  • 4
  • 10
user423430
  • 3,654
  • 3
  • 26
  • 22
8

You may take a look at a commerical component: general sql parser at http://www.sqlparser.com It supports SQL syntax of Oracle, T-SQL, DB2 and MySQL.

James
  • 81
  • 1
  • 1
  • Generate internal query parse tree in xml output for further processing. Support nested subquery, complex join and statements in stored procedure. http://www.dpriver.com/blog/list-of-demos-illustrate-how-to-use-general-sql-parser/generate-internal-query-parse-tree-in-xml-for-further-processing/ –  Dec 21 '10 at 14:43
  • Yup, I'm using it, the API is a tad confusing, but definitely gets the job done. Pretty reasonable price as well. Support is very helpful. – tbone Mar 16 '11 at 16:25
6

Try ANTLR - There are a bunch of SQL grammars on there.

Andrew Peters
  • 11,135
  • 4
  • 37
  • 34
  • 1
    any sample application in .NEt -with source code if is possible- that use AntLR ?? – Kiquenet Aug 03 '10 at 12:34
  • @PuterdoBorato link seems to be dead :( – Chris Owens Dec 17 '15 at 08:05
  • @Chris I was able to generate some SQL parsers using Antler 4.5.3 and . This something does not seem to be compilable though. Compiled parsers: https://github.com/another-guy/SqlSchemer/tree/0374cf63c1143e2b120b698d78e79bef970ee37e/AntlrSqlParsers Steps you can follow to compile SQL parsers yourself (maybe you have more luck resolving the issues): https://github.com/another-guy/SqlSchemer/issues/2 If you do, please let me know. I still need a working parser... – Igor Soloydenko Oct 04 '16 at 22:58
4

VSTS 2008 Database Edition GDR includes assemblies that handle SQL parsing and script generation that you can reference from your project. Database Edition uses the parser to parse the script files to represent in-memory model of your database and then uses the script generator to generate SQL scripts from the model. I think there are just two assemblies you need to have and reference in your project. If you don't have the database edition, you may install the trial version to get the assemblies or there might be another way to have them without installing the database edition. Check out the following link. Data Dude:Getting to the Crown Jewels .

Mehmet Aras
  • 5,284
  • 1
  • 25
  • 32
2

Try GOLD Parser, it's a powerful and easy to learn BNF engine. You can search the grammars already made for what you want (ie: SQL ANSI 89 Grammar).

I started using this for HQL parsing (the NHibernate query language, very similar to SQL), and it's awesome.

UPDATE: Now the NH dev team has done the HQL parsing using ANTLR (which is harder to use, but more powerful AFAIK).

Diego Jancic
  • 7,280
  • 7
  • 52
  • 80
1

As Diego suggested, grammars are the way to go IMHO. I've tried Coco/r before, but that is too simple for complex SQL. There's ANTLR with a number of grammars ready.

Someone even tried to build a SQL engine, check the code if there's something for you in SharpHSQL - An SQL engine written in C#.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Robert Cutajar
  • 3,181
  • 1
  • 30
  • 42