2

I need to write RegEx in C# to parse SQL join query which is given as a string. Can somebody please help me because I'm new at this. Thanks a lot

this is my problem:

string query = @"SELECT table1.column1, table1.column2, table2.coulmn1, table2.column2
                 FROM table1 INNER JOIN table2 ON table1.column5 = table2.column5";

what I actually need is to put all important data into separate variables, like this:

string class1 = table1    
string class2 = table2    
string joinForeignKey1 = table1.column5    
string joinForeignKey2 = table2.column5    
List<string> attributes1 = table1.column1, table1.column2    
List<string> attributes2 = table2.column1, table2.column2

//COMMENT

I realized that I have made a mistake in sql query so there will be an ON clause.

I can force a user to provide me with the correct syntax, so that will be no problem.

The thing I haven't mentioned is that there can be more than one JOIN ON clause (multiple joins).

Thanks a lot and I will appreciative any given help.

Toy
  • 21
  • 1
  • 3
  • Will all of the queries be in exactly this format? Will there be any possibility that comments might exist in the source or derived tables or any other constructs that will mess things up? Edit: I just noticed what you posted is not valid SQL anyway. It is missing an `ON` clause. – Martin Smith Jul 27 '10 at 13:47
  • 4
    SQL is #3 on the list of Stuff You Should Not Try To Parse With A Regex, just behind [HTML](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) and [MUMPS](http://thedailywtf.com/Articles/A_Case_of_the_MUMPS.aspx). Use a dialect-specific, dedicated SQL parser, not a regex. – Craig Stuntz Jul 27 '10 at 13:53
  • @martin - His query is valid SQL, just not using the most current ANSI standard for Joins. – Mitchel Sellers Jul 27 '10 at 13:54
  • @Mitchel: No, he's mixing SQL-92 and SQL-89 style joins into a hybrid which won't work in either of them. SQL-89 would use `table1, table2`, not the `JOIN` keyword. – Craig Stuntz Jul 27 '10 at 13:56
  • @Craig - Crap, I overlooked that... – Mitchel Sellers Jul 27 '10 at 13:59
  • @toy: (1) "ON table1.column5 = table2.column5" ... shouldn't that be "... = table2.column6" ? (2) What, no WHERE clause? (3) If it's so rigidly constrained, when not just get the user to fill in a form with what you want? (4) What's the REAL underlying objective/problem? – John Machin Jul 28 '10 at 07:36
  • I have a structure that is called Join which have 2 structures as her attributes: leftNode and rigthNode. Each node has 3 attributes: string ClassId, string ForeignKey, list columns. From a given sql query I need to extract left class (table1), left ForeignKey (table1.column5) and wanted columns/attributes (table1.column1 and table1.column2) and put them as parameters of leftNode. Same thing is for the rightNode (ClassId = table2, ForeignKey = table2.column5 and columns = table2.column1 and table2.column2 ). That is the correct problem... Again, tnx for any given help :-) – Toy Jul 28 '10 at 09:09

2 Answers2

4

Pulling this over from comments, since I think it's the right answer here:

SQL is #3 on the list of Stuff You Should Not Try To Parse With A Regex, just behind HTML and MUMPS. Use a dialect-specific, dedicated SQL parser, not a regex.

Community
  • 1
  • 1
Craig Stuntz
  • 125,891
  • 12
  • 252
  • 273
1

I personally do not recommend doing this unless you have a VERY, VERY valid reason to do so as well as full control over the way that the SQL would be written.

First and foremost the syntax that you noted for the SQL statement is the old style join syntax and not using the more common ON syntax.

Something like

SELECT A.ColumnA, B.ColumnB
FROM MyTable A
    INNER JOIN YourTable B
        ON (A.MyIdentity = B.MyForeignKey)

So unless you can force users to input queries in the old syntax you are already going down the road to a way that will not work.

If I was forced to do this type of thing, and I did have control over it, I personally wouldn't bother with RegEx, due to the fact that the process is so structured. i would just use basic string manipulation.

Mitchel Sellers
  • 62,228
  • 14
  • 110
  • 173