-1

I am having a sql file which may have multiple SELECT statements, what I need to do is to find out the WHERE clause of the select statements, if any.

Select col1,col2
from table t1
where col1=2 and
convert(something)='abc'

As a result from above query I need complete where clause and I want to make it as generic as possible, (Sub-queries can be ignored as of now). I need some pointers to start on this or any source code available to work upon. Any help is appreciated.

jww
  • 97,681
  • 90
  • 411
  • 885
AjayK
  • 433
  • 5
  • 18

3 Answers3

1

For a statement with no sub-queries, I'd simply start with a simple algorithm.

First replace all white-space (spaces, tabs, newlines, etc) with a single space.

Then find the first occurrence of " where " (note the surrounding spaces) - that's the starting point (excluding the initial space, of course).

From there, locate the first " order by " or " group by " - that's your ending point. If that text doesn't exist, the ending point is the end of the string itself.

In pseudo-code, it will be something like:

def getWhereClause (query):
    query = query.replaceAll ("[ \t\n]*", " ")
    whereClause = ""
    while query is not "":
        if query.startsWithIgnoreCase (" where "):
            whereClause = "where "
            query = query.substring (7)
            next while
        if query.startsWithIgnoreCase (" order by ") or
        if query.startsWithIgnoreCase (" group by "):
            return whereClause
        if whereClause != "":
            whereClause = whereClause + query.substring (0,1)
        query = query.substring (1)
    return whereClause

That should cover the vast majority of non-sub-query queriesand you can add more possibilities to the end conditions as they appear.

One thing to keep in mind is that those text searches and manipulations should really only be done outside of quoted areas. This means you'll need a mini-parser for the statement rather than simple string searches as shown in the pseudo-code if you want a proper solution.

I put forward this solution as a (rather quick and dirty) one since it may well do exactly what you need without the complications of a full SQL parser which, believe me, is not a trivial exercise, from painful experience :-)

paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
  • I second that, do it as simply as you can without making it too complicated or else it will become very painful. Or are there any libraries around for parsing SQL - I have no idea, but just a thought. Also how about a regex that is a bit more complete? – peter Dec 18 '11 at 08:37
  • Initially I thought of RegEx only, but given the non deterministic nature of the statements finding a suitable RegEx requires more time and elaborate testing. And I need this stuff only for some intermediate purpose, so I have started on thought of Paxdiablo. Thanks for the idea @paxdiablo. There are third party parser but as per my needs that is not feasible. – AjayK Dec 18 '11 at 08:49
  • At first this solution looks good and it is simple also but as far as ending of where clause is concerned the approach suggested is to break on some keywords if encountered it is fair enough. But in this case we have to go through lot of cases. What I was thinking was to perform that break as and when there are no more pairs after AND/OR Keywords.The only problem is, in simple cases Its easy to find pair but when any function is used in where condition it will create problems. Any thoughts on this. – AjayK Dec 18 '11 at 09:03
1

If you consider one query as a string you could split it into tokens via " " and then compare each token (lowercased) to "where" - this tells you the first token of the clause then look for "order" followed with "by" or "group" followed with "by" the "group"/"order" token is the token after the end of your clause (if this is not found then your clause extends to the end of the string). Now convert those token indices into character indices in your string and then take a substring to get your clause:

 String query;
 boolean hasWhere = false;
 boolean hasGroupOrOrder = false;
 String[] query_tokens = query.split(" ");

// ---------- FIRST GET THE TOKENS WHERE THE WHERE CLAUSE AND GROUP BY/ORDER BY START ---- 
// we want to find the tokens in the string where the where clause starts and where the group by/order by begins -- if there isn't one then endToken will remain -1
int startToken = -1;
int endToken = -1;

for(int i = 0; i < query_tokens.length(); i++)
{
    if(!hasWhere)
    {
        if(query[i].toLowerCase().equals("where"))
{
            begin = i;
            hasWhere = true;
}

    } 
    else
    {
        if(!hasGroupOrOrder)
        {
            if(query[i].toLowerCase().equals("group") || query[i].toLowerCase().equals("order"))
                hasGroupOrOrder = true;
        }
        else
        {
            if(query[i].toLowerCase().equals("by"))
            {
                end = i - 1;
            }else
                {
                        hasGroupOrOrder = false;
                }
        }
    }
}

// we now know the tokens necessary for the query, now we turn these into the character indexes that can be used to take the where clause substring from the query string

String whereClause;
int startIndex = 0; 
int endIndex = 0;

// ---------- NOW GET START INDEX OF "WHERE" IN STRING ---- we want to find the beginIndex " " character as the where token starts here
for(int i = 0; i < startToken; i++)
{
        startIndex = String.indexOf(" ", startIndex);
}

startIndex += 1; 
// since the startIndex will refer the " " right before "where" we need to increment this by 1
// if you don't need the "where " you can strip this off as well...

// ---------- NOW GET END INDEX EITHER END OF STRING OR BEFORE WHERE/GROUP BY STARTS ----
if(endToken == -1) 
{
// this is the case when the where clause extends to end of string
    endIndex = query.length() - 1;
}
else
{
    // we want to find the endIndex " " character as the group by / order by starts here
    for(int i = 0; i < endIToken; i++)
    {
        endIndex = String.indexOf(" ", endIndex);
    }
    endIndex -= 1; 
 // since the endIndex will be at the " " before the where by/order by we must reduce this by 1 to be end of where clause
}

// ---------- NOW GET SUBSTRING! ----
whereClause = query.substring(startIndex, endIndex);
vassilo
  • 71
  • 3
  • It can find where keyword for me. But my task is to find complete WHERE CLAUSE not WHERE keyword. – AjayK Dec 18 '11 at 09:37
  • I would combine the above with paxdiablos comment that the end of your clause would be signaled by the first "group by" or "order by" clause otherwise your where clause extends to the end of the string. So any tokens after where but before group followed by by or order followed by by would make up your where clause. – vassilo Dec 18 '11 at 10:01
  • It is not mandatory to have ORDER BY or GROUP BY at the end of WHERE clause. That is the only problem,finding all the possible endings, really complex. – AjayK Dec 18 '11 at 10:20
  • This code will only find WHERE clause if and only if there are ORDER BY or GROUP BY present, but if there is no ORDER BY or GROUP BY in SELECT statement then above code will fail. – AjayK Dec 18 '11 at 11:33
  • If there is no ORDER BY or GROUP BY then the where clause extends to the end of the string so your substring will be from the beginning of the WHERE to the end of the string. if(endToken == -1) { // this is the case when the where clause extends to end of string endIndex = query.length() - 1; } – vassilo Dec 18 '11 at 11:37
  • For example :SELECT XYZ FROM TABLE ABC WHERE a=b and c=d and e=f begin transaction select abc from RST. The outcome of above code will be till RST, which is not correct. – AjayK Dec 18 '11 at 11:45
0

Try this one.

using System;
using System.Collections.Generic;
using System.Text;

namespace SQLParser
{
    class Program
    {
        static void Main(string[] args)
        {
            /* A complicated statement with WHERE, ORDER BY and IF Clause*/
            var sql1 = @" SELECT Count(*) AS 'hjkhskj'
                            FROM   ABC

                            SELECT DISTINCT xx AS 'kljklj'
                            FROM   XYZ
                            WHERE  a NOT IN (SELECT b
                                                     FROM   RST)
                            ORDER  BY 1

                            BEGIN TRANSACTION

                            DELETE FROM ABC
                            WHERE  a NOT IN (SELECT b
                                                     FROM   XYZ)

                            IF @@ERROR <> 0
                              BEGIN
                                  ROLLBACK TRANSACTION

                                  GOTO GTY
                              END
                            ELSE
                              COMMIT TRANSACTION

                            SELECT DISTINCT x AS 'SRTT jfkfk'
                            FROM   PQR a
                            WHERE  NOT EXISTS (SELECT m
                                               FROM   MYTU u
                                               WHERE  u.x = m)
                            ORDER  BY 1 ";

            /*A simple statement with WHERE clause only */
            var sql2 = @"Select * FROM Person Where Firstname = 'Merin' AND LastName = 'Nakarmi'";

            var sql3 = @"SELECT TOP 100 * FROM master.sys.fn_get_audit_file('\\network\userdata\audifiles$\SQL_AUDIT\SERVER-1\AdventureWorks\*.sql', DEFAULT, DEFAULT) WHERE CONVERT(DATE, event_time) = CONVERT(DATE, GETDATE());";

            Console.WriteLine(ExtractWhereClause(sql1));
            Console.WriteLine(ExtractWhereClause(sql2));
            Console.WriteLine(ExtractWhereClause(sql3));
            Console.ReadLine();
        }

        /* WHERE clause can occur multiple times in a statement. It will start with WHERE clause and usually ends by ORDER BY CLAUSE, GROUP BY CLAUSE or end of statement */
        struct Occurence
        {
            public int Begin { get; set; }  //Beginning of where clause
            public int End { get; set; }    //Ending of where clause

            internal void Clear()
            {
                Begin = 0;
                End = 0;
            }
        }

        static StringBuilder ExtractWhereClause(string query)
        {
            StringBuilder output = new StringBuilder();

            string[] tokens = query.Split(new char[] {' ', '\r','\t','\n' });
            List<string> nonEmptyTokens = new List<string>();
            List<Occurence> Occurences = new List<Occurence>();

            foreach(string s in tokens)
            {
                if (!s.Equals(""))
                    nonEmptyTokens.Add(s);
            }

            Occurence occurence = new Occurence();

            for (int i = 0; i<nonEmptyTokens.Count; i++)
            {
                var currentToken = nonEmptyTokens[i];
                if (currentToken.ToUpper().Contains("WHERE"))
                {
                    if (occurence.Begin == 0)
                        occurence.Begin = i;
                    else 
                        {
                            occurence.End = i - 1;
                            Occurences.Add(occurence);
                            occurence.Clear();
                            occurence.Begin = i;
                        }
                }
                else if (currentToken.ToUpper().Contains("GROUP") || currentToken.ToUpper().Contains("ORDER"))
                {
                    if (nonEmptyTokens[i + 1].ToUpper().Contains("BY"))
                    {
                        if(occurence.Begin != 0 && occurence.End == 0)
                        {
                            occurence.End = i -1;
                            Occurences.Add(occurence);
                            occurence.Clear();
                        }
                    }
                }
                else if (currentToken.ToUpper().Contains("IF"))
                {
                    if (occurence.Begin != 0 && occurence.End == 0)
                    {
                        occurence.End = i - 1;
                        Occurences.Add(occurence);
                        occurence.Clear();
                    }
                }
            }
            //Check for last Where clause without ending
            if(occurence.Begin != 0)
            {
                occurence.End = nonEmptyTokens.Count - 1;
                Occurences.Add(occurence);
                occurence.Clear();
            }


            foreach(Occurence o in Occurences)
            {
                for(int i = o.Begin; i <= o.End; i++)
                {
                    output.Append(nonEmptyTokens[i]);
                    output.Append(" ");
                }
                output.Append("\n");
            }

            return output;
        }
    }
}

Output:

enter image description here

If you don't want to see WHERE in the console, replace occurence.Begin = i; by occurence.Begin = i + 1;

Merin Nakarmi
  • 3,148
  • 3
  • 35
  • 42